afs/Documentation/filesystems cachefs.txt,1.1,1.2
dwh at infradead.org
dwh at infradead.org
Tue Jul 15 15:51:20 BST 2003
- Previous message: afs/fs/cachefs rootdir.c,1.13,1.14 replay.c,1.1,1.2
recycling.c,1.25,1.26 journal.c,1.39,1.40 interface.c,1.15,1.16
index.c,1.26,1.27 dump-journal.c,1.13,1.14
cachefs-layout.h,1.29,1.30 cachefs-int.h,1.38,1.39 block.c,1.8,1.9
- Next message: afs/fs/cachefs recycling.c,1.26,1.27
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
Update of /home/cvs/afs/Documentation/filesystems
In directory phoenix.infradead.org:/tmp/cvs-serv2936/Documentation/filesystems
Modified Files:
cachefs.txt
Log Message:
documentation update
Index: cachefs.txt
===================================================================
RCS file: /home/cvs/afs/Documentation/filesystems/cachefs.txt,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- cachefs.txt 30 Jan 2003 11:30:38 -0000 1.1
+++ cachefs.txt 15 Jul 2003 13:51:17 -0000 1.2
@@ -1,22 +1,88 @@
- ===========================
- CacheFS: Caching Filesystem
- ===========================
+ ===========================
+ CacheFS: Caching Filesystem
+ ===========================
========
OVERVIEW
========
+CacheFS is a general purpose cache for network filesystems, though it could be
+used for caching other things such as ISO9660 filesystems too.
+
+CacheFS uses a block device directly rather than a directory under a
+filesystem. This means it can perform its own journalling more efficiently, and
+is not beholden to the underlying filesystem. If necessary, however, a file can
+be loopback mounted as a cache.
+
+CacheFS does not follow the idea of completely loading every netfs file opened
+into the cache before it can be operated upon, and then serving the pages out
+of the cachefs rather than the netfs because:
+
+ (1) It must be practical to operate without a cache.
+
+ (2) The size of any accessible file must not be limited to the size of the
+ cache.
+
+ (3) The combined size of all opened files (this includes mapped libraries)
+ must not be limited to the size of the cache.
+
+ (4) The user should not be forced to download an entire file just to do a
+ one-off access of a small portion of it.
+
+It rather serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+
+
+CacheFS provides the following facilities:
+
+ (1) More than one block device can be mounted as a cache.
+
+ (2) Caches can be mounted / unmounted at any time.
+
+ (3) The netfs is provided with an interface that allows either party to
+ withdraw caching facilities from a file (required for (2)).
+
+ (4) The interface to the netfs returns as few errors as possible, preferring
+ rather to let the netfs remain oblivious where possible.
+
+ (5) Cookies are used to represent files and indexes to the netfs. The simplest
+ cookie is just a NULL pointer - indicating nothing cached there.
+
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+ desires, though it must be aware that the index search function is
+ recursive and stack space is limited.
+
+ (7) Data I/O is done direct to and from the netfs's pages. The netfs indicates
+ that page A is at index B of the data-file represented by cookie C, and
+ that it should be read or written. CacheFS may or may not start I/O on
+ that page, but if it does, a netfs callback will be invoked to indicate
+ completion.
+
+ (8) Cookies can be "retired" upon release. At this point CacheFS will mark
+ them as obsolete and the index hierarchy rooted at that point will get
+ recycled.
+
+ (9) The netfs provides a "match" function for index searches. In addition to
+ saying whether a match was made or not, this can also specify that an
+ entry should be updated or deleted.
+
+(10) All metadata modifications (this includes index contents) are performed
+ as journalled transactions. These are replayed on mounting.
+
+
+======================
+GENERAL ON-DISC LAYOUT
+======================
+
The filesystem is divided into a number of parts:
0 +---------------------------+
| Superblock |
1 +---------------------------+
- | Full-BA-Bitmap Bitmap | <-- also referred to as the "Fullmap"
- +---------------------------+
- | Block Allocation Bitmap |
- +---------------------------+
| Update Journal |
+---------------------------+
+ | Validity Journal |
+ +---------------------------+
| Write-Back Journal |
+---------------------------+
| |
@@ -27,19 +93,22 @@
The superblock contains the filesystem ID tags and pointers to all the other
regions.
-There's a block allocation bitmap that has bits set according to which blocks
-in the data region are currently in use. There's also another bitmap which has
-a bit set for every block in the block allocation bitmap that's entirely
-full. This is to make allocating a block faster.
-
The update journal consists of a set of entries of sector size that keep track
of what changes have been made to the on-disc filesystem, but not yet
committed.
+The validity journal contains records of data blocks that have been allocated
+but not yet written. Upon journal replay, all these blocks will be detached
+from their pointers and recycled.
+
The writeback journal keeps track of changes that have been made locally to
-data blocks, but that have not yet been committed back to the server.
+data blocks, but that have not yet been committed back to the server. This is
+not yet implemented.
+
+The journals are replayed upon mounting to make sure that the cache is in a
+reasonable state.
-The data region holds three things:
+The data region holds a number of things:
(1) Index Files
@@ -47,15 +116,20 @@
that wish to cache data here (such as AFS) to keep track of what's in
the cache at any given time.
- The first index file (inode 1) is special. It has an entry (the storage
- management record) for each file in the cache.
+ The first index file (inode 1) is special. It holds the cachefs-specific
+ metadata for every file in the cache (including direct, single-indirect
+ and double-indirect block pointers).
The second index file (inode 2) is also special. It has an entry for
each filesystem that's currently holding data in this cache.
+ Every allocated entry in an index has an inode bound to it. This inode is
+ either another index file or it is a data file.
+
(2) Cached Data Files
- These are caches of files from remote servers.
+ These are caches of files from remote servers. Holes in these files
+ represent blocks not yet obtained from the server.
(3) Indirection Blocks
@@ -67,113 +141,603 @@
- single indirection
- double indirection
- - triple indirection
-
+
+ (4) Allocation Nodes and Free Blocks
+
+ The free blocks of the filesystem are kept in two single-branched
+ "trees". One tree is the blocks that are ready to be allocated, and the
+ other is the blocks that have just been recycled. When the former tree
+ becomes empty, the latter tree is decanted across.
+
+ Each tree is arranged as a chain of "nodes", each node points to the next
+ node in the chain (unless it's at the end) and also up to 1022 free
+ blocks.
+
+Note that all blocks are PAGE_SIZE in size. The blocks are numbered starting
+with the superblock at 0. Using 32-bit block pointers, a maximum number of
+0xffffffff blocks can be accessed, meaning that the maximum cache size is ~16Tb
+for 4Kb pages.
+
========
MOUNTING
========
-Since CacheFS is actually a filesystem, the way you give it a device to cache
-is to mount it as cachefs type on a directory somewhere. The mounted
-filesystem will then present the user with a set of directories (one per
-cached network fs), each of which will contain a set of files that grant read
-access to the indexes contained within.
+Since CacheFS is actually a quasi-filesystem, it requires a block device behind
+it. The way to give it one is to mount it as cachefs type on a directory
+somewhere. The mounted filesystem will then present the user with a set of
+directories outlining the index structure resident in the cache. Indexes
+(directories) and files can be turfed out of the cache by the sysadmin through
+the use of rmdir and unlink.
For instance, if a cache contains AFS data, the user might see the following:
- root>mount -tcachefs /dev/hdg5 /cache-hdg5/
- root>ls -1 /cache-hdg5
+ root>mount -t cachefs /dev/hdg9 /cache-hdg9
+ root>ls -1 /cache-hdg9
afs
- root>ls -1 /cache-hdg5/afs
- cells
- vldb
- files0
- files1
- files2
- ...
- files127
-
-
-==========================
-PHYSICAL ACCESS MANAGEMENT
-==========================
-
-All blocks are PAGE_SIZE in size. The blocks are numbered starting with the
-superblock at 0. Using 32-bit block pointers, a maximum number of 0xffffffff
-blocks can be accessed, meaning that the maximum cache size is ~16Tb for 4Kb
-pages.
-
-The journals, however, are managed in smaller chunks. These reflect the
-underlying atomic sector size of the backing device, and will typically be
-512b (so there will be 8 journal entried per page).
-
-Absent blocks in cached data files do not imply a block of zeros, rather they
-indicate a block that has yet to be fetched from the server.
-
-Since VM pages full of cached data actually belong to whichever network
-filesystem is presenting them to the user, they can't also belong to
-CacheFS. To get around this problem, CacheFS lets the network filesystem tell
-it when to read or write data.
-
-
-==============
-FILESYSTEM API
-==============
-
-There is an API provided by CacheFS that grants network filesystems access to
-any cached data. Each network filesystem declares to CacheFS what indexes it
-requires in a cache, and how many files to allocate per index. For instance,
-AFS would declare:
-
- INDEX QTY CONTENTS
- ======= ======= =====================================
- cells 1 Cached cell information
- vldb 1 Cached volume location information
- files 128 Cached file information (hash table)
-
-Each index definition must also be provided with the following operations:
-
- - a hashing function (should QTY be more than 1)
- - a function to say whether an entry is in use
- - a key comparison function (for searching)
- - an entry update function
- - an entry clear function
- - an access time update function
-
-Index entries can be of any size from 4b up to PAGE_SIZE. Each page is divided
-into as many entries as possible, and the free space at the end is
-ignored. This makes management easier as no entry will be split across pages.
-
-When the network filesystem wants to cache a file, it asks CacheFS for a
-handle. CacheFS walks through the indexes looking for the file (it may already
-be in there) and fills in the handle if it's found. If it wasn't found, then
-CacheFS will allocate new entries in any indexes that the appropriate ones
-don't exist, and fill the handle in with this information.
-
-Multiple caches are searched in order until one returns a match. Matches in
-other caches are ignored, and will eventually be discarded.
-
-When a network filesystem allocates a page from a file, it asks CacheFS to try
-and make an association with a block in the cache. If the association already
-exists on disc, CacheFS loads the data into the page, and the page is ready
-for use. If not, CacheFS reserves a block in the cache into which the contents
-of that page will be written when the network fs has downloaded it from the
-server.
-
-If the page is changed, then the VM will eventually ask the network to write
-the page back to storage. At this point, the network filesystem will ask
-CacheFS to write the page to disc into the block reserved for it.
-
-Additionally, when a page is written back, CacheFS writes an entry back into
-the writeback journal. When the network fs has finished uploading the page to
-the server, it should tell CacheFS so that the writeback journal can be
-further marked to show operation completion.
-
-When a network fs first associates with a cache, CacheFS tells it about all
-the blocks with oustanding writebacks journalled. The network fs can then
-either send them to the server again, or tell CacheFS to discard the pages.
-
-A network filesystem can also ask CacheFS to either delete or invalidate a
-file, thus discarding all the data attached, and either discarding or updating
-the metadata too.
+ root>ls -1 /cache-hdg9/afs
+ cambridge.redhat.com
+ root>ls -1 /cache-hdg9/afs/cambridge.redhat.com
+ root.afs
+ root.cell
+
+However, a block device that's going to be used for a cache must be prepared
+before it can be mounted initially. This is done very simply by:
+
+ echo "cachefs___" >/dev/hdg9
+
+During the initial mount, the basic structure will be scribed into the cache,
+and then a background thread will "recycle" the as-yet unused data blocks.
+
+
+======================
+NETWORK FILESYSTEM API
+======================
+
+There is, of course, an API by which a network filesystem can make use of the
+CacheFS facilities. This is based around a number of principles:
+
+ (1) Every file and index is represented by a cookie. This cookie may or may
+ not have anything associated with it, but the netfs doesn't need to care.
+
+ (2) Barring the top-level index (one entry per cached netfs), the index
+ hierarchy for each netfs is structured according the whim of the netfs.
+
+ (3) Any netfs page being backed by the cache must have a small token
+ associated with it (possibly pointed to by page->private) so that CacheFS
+ can keep track of it.
+
+This API is declared in <linux/cachefs.h>.
+
+
+NETWORK FILESYSTEM DEFINITION
+-----------------------------
+
+CacheFS needs a description of the network filesystem. This is specified using
+a record of the following structure:
+
+ struct cachefs_netfs {
+ const char *name;
+ unsigned version;
+ struct cachefs_netfs_operations *ops;
+ struct cachefs_cookie *primary_index;
+ ...
+ };
+
+This first three fields should be filled in before registration, and the fourth
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+
+The fields are:
+
+ (1) The name of the netfs (used as the key in the toplevel index).
+
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+ entire on-disc hierarchy for this netfs will be scrapped and begun
+ afresh).
+
+ (3) The operations table is defined as follows:
+
+ struct cachefs_netfs_operations {
+ int (*get_page_cookie)(struct page *page,
+ struct cachefs_page **_page_cookie);
+ };
+
+ The functions here must all be present. Currently the only one is:
+
+ (a) get_page_cookie(): Get the token used to bind a page to a block in a
+ cache. This function should allocate it if it doesn't exist.
+
+ Return -ENOMEM if there's not enough memory and -ENODATA if the page
+ just shouldn't be cached.
+
+ Set *_page_cookie to point to the token and return 0 if there is now a
+ cookie. Note that the netfs must keep track of the cookie itself (and
+ free it later). page->private can be used for this (see below).
+
+ (4) The cookie representing the primary index will be allocated according to
+ another parameter passed into the registration function.
+
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+
+ static struct cachefs_netfs_operations afs_cache_ops = {
+ .get_page_cookie = afs_cache_get_page_cookie,
+ };
+
+ struct cachefs_netfs afs_cache_netfs = {
+ .name = "afs",
+ .version = 0,
+ .ops = &afs_cache_ops,
+ };
+
+
+INDEX DEFINITION
+----------------
+
+Indexes are used for two purposes:
+
+ (1) To speed up the finding of a file based on a series of keys (such as AFS's
+ "cell", "volume ID", "vnode ID").
+
+ (2) To make it easier to discard a subset of all the files cached based around
+ a particular key - for instance to mirror the removal of an AFS volume.
+
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, CacheFS tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree. The netfs can even mix indexes and data files at the same level, but
+it's not recommended.
+
+There are some limits on indexes:
+
+ (1) All entries in any given index must be the same size. An array of such
+ entries needn't fit exactly into a page, but they will be not laid across
+ a page boundary.
+
+ The netfs supplies a blob of data for each index entry, and CacheFS
+ provides an inode number and a flag.
+
+ (2) The entries in one index can be of a different size to the entries in
+ another index.
+
+ (3) The entry data must be journallable, and thus must be able to fit into an
+ update journal entry - this limits the maximum size to a little over 400
+ bytes at present.
+
+ (4) The data should start with the key. The layout of the key is described in
+ the index definition, and this is used to display the key in some
+ appropriate way.
+
+ (5) The depth of the index tree should be judged with care as the search
+ function is recursive. Too many layers will run the kernel out of stack.
+
+To define an index, a structure of the following type should be filled out:
+
+ struct cachefs_index_def
+ {
+ u_int8_t name[8];
+ u_int16_t data_size;
+ struct {
+ u_int8_t type;
+ u_int16_t len;
+ } keys[4];
+
+ cachefs_match_val_t (*match)(void *target_netfs_data,
+ const void *entry);
+
+ void (*update)(void *source_netfs_data, void *entry);
+ };
+
+This has the following fields:
+
+ (1) The name of the index (NUL terminated unless all 8 chars are used).
+
+ (2) The size of the data blob provided by the netfs.
+
+ (3) A definition of the key(s) at the beginning of the blob. The netfs is
+ permitted to specify up to four keys. The total length must not exceed the
+ data size. It is assumed that the keys will be laid end to end in order,
+ starting with the first byte of the data.
+
+ The type field specifies the way the data should be displayed. It can be
+ one of:
+
+ (*) CACHEFS_INDEX_KEYS_NOTUSED - key field not used
+ (*) CACHEFS_INDEX_KEYS_BIN - display byte-by-byte in hex
+ (*) CACHEFS_INDEX_KEYS_ASCIIZ - NUL-terminated ASCII
+ (*) CACHEFS_INDEX_KEYS_IPV4ADDR - display as IPv4 address
+ (*) CACHEFS_INDEX_KEYS_IPV6ADDR - display as IPv6 address
+
+ (4) A function to match an in-page-cache index entry blob to netfs data passed
+ to the cookie acquisition function by the netfs. This function can also be
+ used to extract data from the blob and copy it into the netfs's
+ structures.
+
+ The return values this function can make are:
+
+ (*) CACHEFS_MATCH_FAILED - failed to match
+ (*) CACHEFS_MATCH_SUCCESS - successful match
+ (*) CACHEFS_MATCH_SUCCESS_UPDATE - successful match, entry needs update
+ (*) CACHEFS_MATCH_SUCCESS_DELETE - entry should be deleted
+
+ For example, in linux/fs/afs/vnode.c:
+
+ static cachefs_match_val_t
+ afs_vnode_cache_match(void *target, const void *entry)
+ {
+ const struct afs_cache_vnode *cvnode = entry;
+ struct afs_vnode *vnode = target;
+
+ if (vnode->fid.vnode != cvnode->vnode_id)
+ return CACHEFS_MATCH_FAILED;
+
+ if (vnode->fid.unique != cvnode->vnode_unique ||
+ vnode->status.version != cvnode->data_version)
+ return CACHEFS_MATCH_SUCCESS_DELETE;
+
+ return CACHEFS_MATCH_SUCCESS;
+ }
+
+ (5) A function to initialise or update an in-page-cache index entry blob from
+ netfs data passed to CacheFS by the netfs. This function should not assume
+ that there's any data yet in the in-page-cache.
+
+ Continuing the above example:
+
+ static void afs_vnode_cache_update(void *source, void *entry)
+ {
+ struct afs_cache_vnode *cvnode = entry;
+ struct afs_vnode *vnode = source;
+
+ cvnode->vnode_id = vnode->fid.vnode;
+ cvnode->vnode_unique = vnode->fid.unique;
+ cvnode->data_version = vnode->status.version;
+ }
+
+To finish the above example, the index definition for the "vnode" level is as
+follows:
+
+ struct cachefs_index_def afs_vnode_cache_index_def = {
+ .name = "vnode",
+ .data_size = sizeof(struct afs_cache_vnode),
+ .keys[0] = { CACHEFS_INDEX_KEYS_BIN, 4 },
+ .match = afs_vnode_cache_match,
+ .update = afs_vnode_cache_update,
+ };
+
+The first element of struct afs_cache_vnode is the vnode ID.
+
+And for contrast, the cell index definition is:
+
+ struct cachefs_index_def afs_cache_cell_index_def = {
+ .name = "cell_ix",
+ .data_size = sizeof(afs_cell_t),
+ .keys[0] = { CACHEFS_INDEX_KEYS_ASCIIZ, 64 },
+ .match = afs_cell_cache_match,
+ .update = afs_cell_cache_update,
+ };
+
+The cell index is the primary index for kAFS.
+
+
+NETWORK FILESYSTEM (UN)REGISTRATION
+-----------------------------------
+
+The first step is to declare the network filesystem to the cache. This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+
+The registration function is:
+
+ int cachefs_register_netfs(struct cachefs_netfs *netfs,
+ struct cachefs_index_def *primary_idef);
+
+It just takes pointers to the netfs definition and the primary index
+definition. It returns 0 or an error as appropriate.
+
+For kAFS, registration is done as follows:
+
+ ret = cachefs_register_netfs(&afs_cache_netfs,
+ &afs_cache_cell_index_def);
+
+The last step is, of course, unregistration:
+
+ void cachefs_unregister_netfs(struct cachefs_netfs *netfs);
+
+
+INDEX REGISTRATION
+------------------
+
+The second step is to get cookies to represent the other indexes that are
+required to find files. This involves requesting that a new index entry be made
+in some already existing index:
+
+ void cachefs_acquire_cookie(struct cachefs_cookie *iparent,
+ struct cachefs_index_def *idef,
+ void *netfs_data,
+ struct cachefs_cookie **_cookie);
+
+This function creates an index entry in the index represented by iparent,
+loading the associated blob by calling iparent's update method with the
+supplied netfs_data.
+
+It also creates a new index inode, formatted according to the definition
+supplied in idef. The new cookie is then returned in *_cookie.
+
+Note that this function never returns an error - all errors are handled
+internally. It may also return a NULL token. It is quite acceptable to pass
+this token back to this function as iparent (or even to the relinquish cookie,
+read page and write page functions - see below).
+
+Note also that no indexes are actually created on disc until a data file needs
+to be created somewhere down the hierarchy. Furthermore, an index may be
+created in several different caches independently at different times. This is
+all handled transparently, and the netfs doesn't see any of it.
+
+For example, with AFS, a cell would be added to the primary index. This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+
+ cachefs_acquire_cookie(afs_cache_netfs.primary_index,
+ &afs_vlocation_cache_index_def,
+ cell,
+ &cell->cache);
+
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+
+ cachefs_acquire_cookie(cell->cache,
+ &afs_volume_cache_index_def,
+ vlocation,
+ &vlocation->cache);
+
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+
+ cachefs_acquire_cookie(vlocation->cache,
+ &afs_vnode_cache_index_def,
+ volume,
+ &volume->cache);
+
+
+DATA FILE REGISTRATION
+----------------------
+
+The third step is to request a data file be created in the cache. This is
+almost identical to index cookie acquisition. The only difference is that a
+NULL index definition is passed.
+
+ cachefs_acquire_cookie(volume->cache,
+ NULL,
+ vnode,
+ &vnode->cache);
+
+
+
+PAGE ALLOC/READ/WRITE
+---------------------
+
+And the fourth step is to propose a page be cached. There are two functions
+that are used to do this.
+
+Firstly, the netfs should ask CacheFS to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+
+ typedef
+ void (*cachefs_rw_complete_t)(void *cookie_data,
+ struct page *page,
+ void *end_io_data,
+ int error);
+
+ int cachefs_read_or_alloc_page(struct cachefs_cookie *cookie,
+ struct page *page,
+ cachefs_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+The cookie argument must specify a data file cookie, the page specified will
+have the data loaded into it (and is also used to specify the page number), and
+the gfp argument is used to control how any memory allocations made are satisfied.
+
+If the cookie indicates the inode is not cached:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a copy of the page resident on disc:
+
+ (1) The function will submit a request to read the data off the disc directly
+ into the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the read is complete, end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The data passed to the above function.
+
+ (*) An argument that's 0 on success or negative for an error.
+
+ If an error occurs, it should be assumed that the page contains no usable
+ data.
+
+Otherwise, if there's not a copy available on disc:
+
+ (1) A block may be allocated in the cache and attached to the inode at the
+ appropriate place.
+
+ (2) The validity journal will be marked to indicate this page does not yet
+ contain valid data.
+
+ (3) The function will return -ENODATA.
+
+
+Secondly, the if the netfs changes the contents of the page (either due to
+an initial download or if a user performs a write), then the page should be
+written back:
+
+ int cachefs_write_page(struct cachefs_cookie *cookie,
+ struct page *page,
+ cachefs_rw_complete_t end_io_func,
+ void *end_io_data,
+ unsigned long gfp);
+
+The cookie argument must specify a data file cookie, the page specified will
+should have the data ready to be written from it (and is also used to specify
+the page number), and the gfp argument is used to control how any memory
+allocations made are satisfied.
+
+If the cookie indicates the inode is not cached then:
+
+ (1) The function will return -ENOBUFS.
+
+Else if there's a block allocated on disc to hold this page:
+
+ (1) The function will submit a request to write the data to the disc directly
+ from the page specified.
+
+ (2) The function will return 0.
+
+ (3) When the write is complete:
+
+ (a) Any associated validity journal entry will be cleared (the block now
+ contains valid data as far as CacheFS is concerned).
+
+ (b) end_io_func() will be invoked with:
+
+ (*) The netfs data supplied when the cookie was created.
+
+ (*) The page descriptor.
+
+ (*) The data passed to the above function.
+
+ (*) An argument that's 0 on success or negative for an error.
+
+ If an error happens, it can be assumed that the page has been
+ discarded from the cache.
+
+
+PAGE UNCACHING
+--------------
+
+To uncache a page, this function should be called:
+
+ void cachefs_uncache_page(struct cachefs_cookie *cookie,
+ struct page *page);
+
+This detaches the page specified from the data file indicated by the cookie and
+unbinds it from the underlying block.
+
+Note that pages can't be explicitly detached from the a data file. The whole
+data file must be retired (see the relinquish cookie function below).
+
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions.
+
+
+INDEX AND DATA FILE UNREGISTRATION
+----------------------------------
+
+To get rid of a cookie, this function should be called.
+
+ void cachefs_relinquish_cookie(struct cachefs_cookie *cookie,
+ int retire);
+
+If retire is non-zero, then the index or file will be marked for recycling, and
+all copies of it will be removed from all active caches in which it is present.
+
+If retire is zero, then the inode may be available again next the the
+acquisition function is called.
+
+One very important note - relinquish should NOT be called unless all "child"
+indexes, files and pages have been relinquished first.
+
+
+PAGE TOKEN MANAGEMENT
+---------------------
+
+As previously mentioned, the netfs must keep a token associated with each page
+currently actively backed by the cache. This is used by CacheFS to go from a
+page to the internal representation of the underlying block and back again. It
+is particularly important for managing the withdrawal of a cache whilst it is
+in active service (ie: it got unmounted).
+
+The token is this:
+
+ struct cachefs_page {
+ ...
+ };
+
+Note that all fields are for internal CacheFS use only.
+
+The token only needs to be allocated when CacheFS asks for it. This it will do
+be calling the get_page_cookie() method in the netfs definition ops table. Once
+allocated, the same token should be presented every time the method is called
+again for the a particular page.
+
+The token should be retained by the netfs, and should be deleted only after the
+page has been uncached.
+
+One way to achieve this is to attach the token to page->private (and set the
+PG_private bit on the page) once allocated. Shortcut routines are provided by
+CacheFS to do this. Firstly, to retrieve if present and allocate if not:
+
+ int cachefs_page_get_private(struct page *page,
+ struct cachefs_page **_page,
+ unsigned gfp);
+
+Secondly to retrieve if present and BUG if not:
+
+ static inline
+ struct cachefs_page *__cachefs_page_get_private(struct page *page);
+
+To clean up the tokens, the netfs inode hosting the page should be provided
+with address space operations that circumvent the buffer-head operations for a
+page. For instance:
+
+ struct address_space_operations afs_fs_aops = {
+ ...
+ .sync_page = block_sync_page,
+ .set_page_dirty = __set_page_dirty_nobuffers,
+ .releasepage = afs_file_releasepage,
+ .invalidatepage = afs_file_invalidatepage,
+ };
+
+ static int afs_file_invalidatepage(struct page *page,
+ unsigned long offset)
+ {
+ struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+ int ret = 1;
+
+ BUG_ON(!PageLocked(page));
+ if (!PagePrivate(page))
+ return 1;
+ cachefs_uncache_page(vnode->cache,page);
+ if (offset == 0)
+ return 1;
+ BUG_ON(!PageLocked(page));
+ if (PageWriteback(page))
+ return 0;
+ return page->mapping->a_ops->releasepage(page, 0);
+ }
+
+ static int afs_file_releasepage(struct page *page, int gfp_flags)
+ {
+ struct cachefs_page *token;
+ struct afs_vnode *vnode = AFS_FS_I(page->mapping->host);
+
+ if (PagePrivate(page)) {
+ cachefs_uncache_page(vnode->cache, page);
+ token = (struct cachefs_page *) page->private;
+ page->private = 0;
+ ClearPagePrivate(page);
+ if (token)
+ kfree(token);
+ }
+ return 0;
+ }
- Previous message: afs/fs/cachefs rootdir.c,1.13,1.14 replay.c,1.1,1.2
recycling.c,1.25,1.26 journal.c,1.39,1.40 interface.c,1.15,1.16
index.c,1.26,1.27 dump-journal.c,1.13,1.14
cachefs-layout.h,1.29,1.30 cachefs-int.h,1.38,1.39 block.c,1.8,1.9
- Next message: afs/fs/cachefs recycling.c,1.26,1.27
- Messages sorted by:
[ date ]
[ thread ]
[ subject ]
[ author ]
More information about the linux-afs-cvs
mailing list