mtd/Documentation/jffs3 JFFS3design.tex, 1.27, 1.28 definit.tex, 1.1, 1.2 intro.tex, 1.1, 1.2 jffs2.tex, 1.1, 1.2 jffs3req.tex, 1.1, 1.2 ref.tex, 1.1, 1.2 super.tex, 1.1, 1.2 tree.tex, 1.1, 1.2

Tue Nov 15 07:41:47 EST 2005

Update of /home/cvs/mtd/Documentation/jffs3
In directory phoenix.infradead.org:/tmp/cvs-serv11030

Modified Files:
	JFFS3design.tex definit.tex intro.tex jffs2.tex jffs3req.tex 
	ref.tex super.tex tree.tex 
Log Message:
Various fixes in various chapters.
Add 2 sub-sections to the "The tree" section.
Switch version to 0.29.



Index: JFFS3design.tex
===================================================================
RCS file: /home/cvs/mtd/Documentation/jffs3/JFFS3design.tex,v
retrieving revision 1.27
retrieving revision 1.28
diff -u -r1.27 -r1.28

--- JFFS3design.tex	13 Nov 2005 17:53:41 -0000	1.27
+++ JFFS3design.tex	15 Nov 2005 12:41:43 -0000	1.28
@@ -57,9 +57,9 @@
 \large{Artem B. Bityutskiy\\
 dedekind at infradead.org}\\
 \vspace{13cm}
-\large{Version 0.28 (draft)}\\
+\large{Version 0.29 (draft)}\\
 \vspace{0.5cm}
-November 12, 2005
+November 15, 2005
 \end{center}
 \end{titlepage}
 %\maketitle
@@ -181,7 +181,9 @@
 
 \item Direct I/O.
 
-\item When/how to update stat-data ?
+\item When/how to update stat-data?
+
+\item How t ochose/add a key scheme?
 
 \end{enumerate}
 
@@ -215,6 +217,8 @@
 \item The minimal amount of file's data in a node is \texttt{PAGE\_SIZE}. No
 way to create smaller nodes as it it possible in JFFS2.
 
+\item Portability.
+
 \end{enumerate}
 
 The following is the list of ideas which were thought about but are not yet in
@@ -274,6 +278,10 @@
 
 \item $I$~-- inode number.
 
+\item $K$, $K_x$~-- tree's keys.
+
+\item $k$, $k_x$~-- keys' fields.
+
 \item $L$~-- the number of levels in the tree.
 
 \item $m$~-- the number of eraseblocks used in the superblock management

Index: definit.tex
===================================================================
RCS file: /home/cvs/mtd/Documentation/jffs3/definit.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- definit.tex	13 Nov 2005 17:53:41 -0000	1.1
+++ definit.tex	15 Nov 2005 12:41:43 -0000	1.2
@@ -16,6 +16,8 @@
 \item \textbf{Acl}~-- an object in the tree containing inode's ACL. Refer to
 section~\ref{ref_SectionObjects} for more information.
 
+\item \textbf{Acl key}~-- acl object's key.
+
 \item \textbf{Acl node}~-- a leaf node containing an acl object.
 
 \item \textbf{Anchor eraseblock, anchor area}~-- the second and the third
@@ -27,6 +29,8 @@
 attributes like the type of compression, etc). Refer to
 section~\ref{ref_SectionObjects} for more information.
 
+\item \textbf{Attr-data key}~-- attr-data object's key.
+
 \item \textbf{Attr-data node}~-- a leaf node containing an \mbox{attr-data}
 object.
 
@@ -53,11 +57,15 @@
 \item \textbf{Data area}~-- the whole JFFS3 partition excluding the static
 superblock and anchor eraseblocks.
 
+\item \textbf{Data key}~-- data object's key.
+
 \item \textbf{Data node}~-- a leaf node with file's data.
 
 \item \textbf{Directory entry, direntry}~-- basically an association between
 the name and the inode number.
 
+\item \textbf{Direntry key}~-- direntry object's key.
+
 \item \textbf{Direntry node}~-- a leaf node containing a direntry object.
 
 \item \textbf{Dirt, dirty space}~-- information on flash which is not valid due
@@ -199,19 +207,23 @@
 \item \textbf{Xattr}~-- a widely used contracted form for \textbf{extended
 attributes}.
 
-\item \textbf{Xattr ID}~-- an unique identifier of an extended attribute. Refer to
-section~\ref{ref_SectionObjects} for more information.
-
 \item \textbf{Xattr-data}~-- an object in the tree containing the data of an
 extended attribute.
 
+\item \textbf{xattr-data key}~-- xattr-data object's key.
+
 \item \textbf{Xattr-data node}~-- a leaf node containing an \mbox{attr-data}
 object.
 
+\item \textbf{Xattr ID}~-- an unique identifier of an extended attribute. Refer to
+section~\ref{ref_SectionObjects} for more information.
+
 \item \textbf{Xentry}~-- an object in the tree which stores the association
 between the name of an extended attribute and its xattr~ID. Refer to
 section~\ref{ref_SectionObjects} for more information.
 
+\item \textbf{Xentry key}~-- xentry object's key.
+
 \item \textbf{Xentry node}~-- a leaf node containing a xentry object.
 
 \end{enumerate}

Index: intro.tex
===================================================================
RCS file: /home/cvs/mtd/Documentation/jffs3/intro.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- intro.tex	13 Nov 2005 17:53:41 -0000	1.1
+++ intro.tex	15 Nov 2005 12:41:43 -0000	1.2
@@ -354,19 +354,18 @@
 
 This section illustrates how does JFFS3 indexing work by means of a simple
 example. The example is very rough but it shows JFFS3 indexing in action. It is
-assumed that keys of direntries and data objects have layout that
-mentioned in an example in section~\ref{ref_SectionIndexing}.
+assumed that keys of direntries and data objects have the same layout that
+is mentioned in section~\ref{ref_SectionIndexing}.
 
-Suppose that an user does the following:
+Suppose that user does the following:
 
 \begin{enumerate}
 
 \item mounts JFFS3 file system to "\texttt{/mnt/jffs3}" directory;
 
-\item issues "\texttt{ls /mnt/jffs3}" command in order find out if there is
-a "\texttt{my\_file}" present there;
+\item issues "\texttt{ls /mnt/jffs3}" command;
 
-\item reads the contents of the "\texttt{my\_file}" file.
+\item reads the contents of "\texttt{/mnt/jffs3/my\_file}" file.
 
 \end{enumerate}
 
@@ -377,23 +376,31 @@
 
 \item During mount JFFS3 locates the position of the root node. This is done
 with help of the JFFS3 \emph{superblock} which will be described later
-(section~\ref{ref_SectionSB}).
+(see section~\ref{ref_SectionSB}).
 
-\item to find out all directory entries in the root directory, JFFS3 searches
-all objects matching \{\texttt{1},~$*$\} pattern. Indeed, direntry keys have
-\{\texttt{parent~inode~\#},~\texttt{name~hash}\} format, the root directory inode
-number is 1 (a predefined constant). So, JFFS3 searches for any direntry in the
-root directory ("$*$" means wildcard).
-
-\item JFFS3 reads the direntry object which corresponds to the
-"\texttt{my\_file}" file using \{\texttt{1},~$H($\texttt{"my\_file"$)$}\} key,
-where $H()$ is the hash function. The direntry object contains
-"\texttt{my\_file}"'s inode number ($I$). Then JFFS3 searches for data objects
-of the "\texttt{my\_file}" file using \{$I$,~\texttt{offset}\} keys. Depending
-on which part of file should be read, \texttt{offset} takes different values.
+\item To get the list of directory entries in the root directory, JFFS3
+looks~up all objects matching the \{\texttt{2},~$*$\} key pattern. Indeed,
+direntry keys have \{\texttt{parent~inode~\#},~\texttt{name~hash}\} format, the
+root directory inode number is 2 (or another predefined constant).  $*$" means
+wildcard. Thus, JFFS3, \{\texttt{2},~$*$\} will match to any direntry in the
+root directory.
+
+\item To read the "\texttt{my\_file}" file, JFFS3 first needs to find out its
+inode number. The inode number is stored in the directory entry object. Hence,
+JFFS3 reads \texttt{my\_file}'s direntry using
+\{\texttt{1},~$H($\texttt{"my\_file"$)$}\} key ($H()$ is the hash function).
+
+Then JFFS3 searches for \texttt{my\_file}'s data objects using
+\{$I$,~\texttt{offset}\} keys. Depending on which part of file should be read,
+\texttt{offset} may take different values.
 
 \end{enumerate}
 
+The above description is somewhat simplified e.g., JFFS3 also needs to read
+\texttt{my\_file}'s attr-data object to fetch the inode length from there (see
+section \ref{ref_SectionObjects}), etc. But the aim of the section is just to
+provide an idea of how JFFS3 indexing works.
+
 %
 % THE JOURNAL
 %




Index: super.tex
===================================================================
RCS file: /home/cvs/mtd/Documentation/jffs3/super.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- super.tex	13 Nov 2005 17:53:41 -0000	1.1
+++ super.tex	15 Nov 2005 12:41:43 -0000	1.2
@@ -194,7 +194,6 @@
 \frac{T_A}{T_D} \geqslant 1,
 \label{ref_EquationSBIneq}
 \end{equation}
-
 where $T_A$ is the period of time of the total anchor area wear and $T_D$ is
 the period of time of the total data area wear. Note, the whole JFFS3 partition
 excluding the static superblock and the anchor area is referred to as the
@@ -214,7 +213,6 @@
 $$
 T_D = \frac{(M-3) \cdot D \cdot N}{R_D},
 $$
-
 where $D$ is the maximum number of flash eraseblock erase cycles, and $M$ is
 the number of non-bad eraseblock on the JFFS3 partition. We subtracted 3 from
 $M$ to get the number of eraseblocks in the data area.
@@ -261,7 +259,6 @@
 $$
 \frac{T_A}{T_D} = 2 \cdot \frac{2N^3 + N^2 + N}{M-3},
 $$
-
 and for $m = 0,1,2,\ldots$
 
 $$
@@ -274,13 +271,11 @@
 $$
 2 \cdot \frac{2N^m + N^{m-1} + \ldots + N}{M-3} \geqslant 1,
 $$
-
 or neglecting the minor components,
 
 $$
 \frac{4N^m}{M-3} \geqslant 1,
 $$
-
 or
 
 \begin{equation}
@@ -337,7 +332,6 @@
 $$
 S = 2m + log_2(2N) + (m - 1) \cdot log_2(N)
 $$
-
 sectors.
 
 Table~\ref{ref_TableNANDTimes} contains the approximate superblock search time

Index: tree.tex
===================================================================
RCS file: /home/cvs/mtd/Documentation/jffs3/tree.tex,v
retrieving revision 1.1
retrieving revision 1.2
diff -u -r1.1 -r1.2
--- tree.tex	13 Nov 2005 17:53:41 -0000	1.1
+++ tree.tex	15 Nov 2005 12:41:43 -0000	1.2
@@ -59,9 +59,8 @@
 extended attributes and \emph{xattr~IDs}. Every extended attribute in the file
 system has a corresponding xattr entry object. This is analogous to direntry
 objects, but direntries contain
-\{\texttt{xattr~name}$\Rightarrow$\texttt{xattr~ID}\} mapping instead of
-\{\texttt{direntry~name}$\Rightarrow$\texttt{inode~number}\} mapping in
-xentries.
+\{\texttt{direntry~name}$\Rightarrow$\texttt{inode~number}\} mapping, instead
+of \{\texttt{xattr~name}$\Rightarrow$\texttt{xattr~ID}\} mapping in xentries.
 
 Each extended attribute in JFFS3 has its own unique number~-- xattr~ID, just
 like every inode has its own unique inode number. And in fact, JFFS3 utilizes
@@ -78,13 +77,13 @@
 (information about ACLs may be found out at~[\ref{ref_ACL}]). Acl objects are
 stored in \emph{acl nodes}.
 
-In \mbox{real-world} systems a lot of files have equivalent ACL while only few
-files have unique ACL. For the former group of files (or more strictly~--
-inodes) JFFS3 makes use of \emph{shared~acl} objects. This means, that there is
-only one acl object instance for all of these inodes. Shared acls are referred
-to from \mbox{attr-data} objects of these inodes. If a shared acl is written
-to, a new acl object is created (\mbox{copy-on-write} mechanism). Conversely, for
-the latter group there is a distinct acl object per each inode.
+In \mbox{real-world} systems a vast number of files have equivalent ACL while
+only few files have unique ACL. For the former group of files (or more
+strictly~-- inodes) JFFS3 makes use of \emph{shared~acl} objects. This means,
+that there is only one acl object instance for all of these inodes. Shared acls
+are referred to from \mbox{attr-data} objects of these inodes. If a shared acl
+is written to, a new acl object is created (\mbox{copy-on-write} mechanism).
+Conversely, for the latter group there is a distinct acl object per each inode.
 
 \end{enumerate}
 
@@ -93,16 +92,16 @@
 %
 \subsection{Keys} \label{ref_SectionKeys}
 
-Each object has its own key and may be quickly found out in the tree by its
+Each object has its own key and may be quickly looked~up in the tree by its
 key. As there are 6 object types in JFFS3, there are also 6 key types:
 
 \begin{enumerate}
 \item \emph{data keys}~-- index data objects;
 \item \emph{direntry keys}~-- index direntry objects;
-\item \emph{attr-data key}~-- index attr-data objects;
-\item \emph{xentry key}~-- index xentry objects;
-\item \emph{xattr-data key}~-- index xattr-data objects;
-\item \emph{acl key}~-- index acl objects.
+\item \emph{attr-data keys}~-- index attr-data objects;
+\item \emph{xentry keys}~-- index xentry objects;
+\item \emph{xattr-data keys}~-- index xattr-data objects;
+\item \emph{acl keys}~-- index acl objects.
 \end{enumerate}
 
 %
@@ -112,16 +111,15 @@
 
 Lets start discussing JFFS3 keys with an example of a simple key layout which
 is further referred to as the \emph{trivial key scheme}. All keys in this
-scheme have the same \mbox{67-bits} length (see
-figure~\ref{ref_FigureTrivKey}). 
+scheme have the same \mbox{56-bit} length (see figure~\ref{ref_FigureTrivKey}). 
 
 \begin{itemize}
 
 \item Data keys consist of the \mbox{32-bit} inode number the data belongs to,
-the unique \mbox{3-bit} key type identifier, and the \mbox{32-bit} data offset.
+the unique \mbox{3-bit} key type identifier, and the \mbox{19-bit} data offset.
 
 \item Direntry keys consist of the \mbox{32-bit} parent directory inode number,
-the unique \mbox{3-bit} key type identifier, and the \mbox{32-bit} direntry
+the unique \mbox{3-bit} key type identifier, and the \mbox{19-bit} direntry
 name hash value.
 
 \item Attr-data keys consist of the \mbox{32-bit} inode number the attributes
@@ -129,10 +127,10 @@
 
 \item Xentry keys consist of the \mbox{32-bit} inode number the extended
 attribute belongs to, the unique \mbox{3-bit} key type identifier, and the
-\mbox{32-bit} extended attribute name hash value.
+\mbox{19-bit} extended attribute name hash value.
 
 \item Xattr-data keys consist of the \mbox{32-bit} xattr ID, the unique
-\mbox{3-bit} key type identifier, and the \mbox{32-bit} extended attribute data
+\mbox{3-bit} key type identifier, and the \mbox{19-bit} extended attribute data
 offset.
 
 \item Acl keys consist of the \mbox{32-bit} inode number the acl object belongs
@@ -156,31 +154,87 @@
 \label{ref_FigureTrivKey}
 \end{figure}
 
-The trivial key scheme is not actually used in JFFS3 and is only needed to
-ease the further discussion.
+Since data objects may only contain multiple RAM~pages of data (excluding small
+files and files' tails) offsets in keys are always RAM~page
+\mbox{size-aligned}. Assuming RAM~page is 4K (13~bits), 19~bits is enough to
+refer up to 4GB of data. To put it differently, the trivial key scheme limits
+files' length to 4GB providing system's RAM~page size is 4K.
 
+It is also worth noting that several objects of the same type may have the same
+key. For example, in case of hash collision, two direntries may have equivalent
+keys. In this case objects may be distinguished by means of reading the
+corresponding leaf node headers.
+
+%
+% Keys comparison
+%
+\subsubsection{Keys comparison}
+
+The important topic is how keys are compared as it defines the relative order
+of objects in the tree and is crucial for searching. Note, it only makes sense
+to compare keys of the same type.
+
+JFFS3 keys are usually comprised of one or more fields, i.e., keys $K_1$ and
+$K_2$ may be represented as
+
+$$
+K_1 = \{k^1_1, k^2_1, ..., k^p_1\},
+K_2 = \{k^1_2, k^2_2, ..., k^p_2\},
+$$
+where $p$ is the number of components in keys of this type.
+
+Keys $K_1$ and $K_2$ are considered to be \emph{equivalent} if and only if all
+their fields are equivalent, i.e. $k^i_1 = k^i_2i = 1, 2, ..., p$.
+
+Keys are compared \mbox{field-by-field} starting from the first field. If on
+$i$'th step $k^i_1 > k^i_2$, then $K_1$ is considered to be \emph{greater} then
+$K_2$.  Similarly, if on $i$'th step $k^i_1 < k^i_2$, then $K_1$ is considered
+to be \emph{less} then $K_2$.
+
+%
+% Key schemes
+%
 \subsubsection{Key schemes}
 
-The trivial key scheme makes use of \mbox{32-bit} inode numbers and
-\mbox{32-bit} file offsets. But if a system needs to use huge files (larger
-then 4GB), 32 bits may also be insufficient and more bits should be used.
-Similarly, the file system may want to utilize huge number of files so
-\mbox{32-bits} may be not sufficient to store the inode number.
-
-From the other hand, some systems may have only few inodes and, say,
-24~bits may be enough. Similarly, some systems may have not very large flash
-partition and do not utilize large files, so 30~bits for the file offset may be
-enough for them.
-
-It also possible that one may want to use some tricky key layouts to achieve
-different kinds of optimizations. For example, direntry keys may include the
-first 8~bytes (64~bits) of the direntry name (see figure
+\emph{Key schemes} define layout of keys for all the 6 object types.
+Apparently, it would be too inflexible to hardcode JFFS3 to support only one
+fixed key scheme. Indeed, there may be a great deal of reasons why users may
+want to use different key schemes in different situations~-- some examples go
+bellow.
+
+\begin{itemize}
+
+\item The inode number is encoded by a \mbox{32-bit} integer in the trivial key
+scheme which means that about 4~million inodes and extended attributes may
+exist on the file system simultaneously. But in some cases this may be
+insufficient or, conversely, too much and one may want to use less bits to
+encode inode numbers (say, only 24 bits), just for optimization.
+
+\item Similarly, offsets are encoded as \mbox{19-bit} integers in the trivial
+key scheme which may be insufficient when huge files (larger then 4G) should be
+supported. So one may want to use more bits in certain cases.
+
+\item Depending on the concrete JFFS3 usage, different hash functions may be
+used in direntry keys. The length of hash values may also vary depending on how
+many directory entries are kept in directories. If there are huge directories
+with millions of files there, long hash values should be used to avoid massive
+hash collisions (say, \mbox{64-bit} hash values). But if it is known in advance
+that there will be no too large directory entries, \footnote{Note, when talking
+about directories, words "\emph{large}" and "\emph{small}" describe how many
+direntries are kept in these directories. The more direntries a directory
+contains, the larger is it.} the length of hash values may be shorter.
+
+\item It also possible that one may want to use some tricky key layouts to
+achieve different kinds of optimization. For example, direntry keys may
+include the first 8~bytes (64~bits) of the direntry name (see figure
 \ref{ref_FigureDirentKeyEx_01}). In this case the \texttt{getdents}
-\footnote{See \texttt{getdents(2)} Linux manual pages} Linux system call will
-return direntries in mostly alphabetically sorted order and \mbox{user-space}
+\footnote{See \texttt{getdents (2)} Linux manual pages} Linux system call will
+return direntries in "mostly" alphabetically sorted order and \mbox{user-space}
 programs will not spend much time to sort them. In fact this technique is used
-in Reiser4 and it is claimed that slow sorting is a bottleneck in certain file
-system workloads.
+in the Reiser4 file system and it is claimed that slow sorting is a bottleneck
+in certain \mbox{real-life} workloads. And the like.
+
+\end{itemize}
 
 %
 % Direntry key layout example
@@ -198,4 +252,6 @@
 \label{ref_FigureDirentKeyEx_01}
 \end{figure}
 
-\texttt{[To be continued.]}
+So it is obvious why JFFS3 is not attached to a fixed key scheme but instead,
+admits of many different key schemes (one at a time of course) with a
+possibility to choose the best suited key scheme.