mtd/fs/jffs3 JFFS3design.tex,1.27,1.28

Fri Mar 25 12:45:34 EST 2005

Update of /home/cvs/mtd/fs/jffs3
In directory phoenix.infradead.org:/tmp/cvs-serv22895

Modified Files:
	JFFS3design.tex 
Log Message:
Update.
Add some algorithm descriptions.


Index: JFFS3design.tex
===================================================================
RCS file: /home/cvs/mtd/fs/jffs3/JFFS3design.tex,v
retrieving revision 1.27
retrieving revision 1.28
diff -u -r1.27 -r1.28

--- JFFS3design.tex	15 Mar 2005 18:42:57 -0000	1.27
+++ JFFS3design.tex	25 Mar 2005 17:45:31 -0000	1.28
@@ -184,11 +184,11 @@
 generate redundant adler32 checksum for better performance.
 
 %
-% MAXIMAL SIZE OF INODE NODE DATA
+% MAXIMAL SIZE OF THE INODE NODE DATA
 %
-\section{Maximal size of inode node data}
+\section{Maximal size of the inode node data}
 It is proposed to enlarge the maximal data node size to effect several
-enchancements (see \ref{ref_BackgroundInformation} for more details).
+enhancements (see \ref{ref_BackgroundInformation} for more details).
 
 JFFS3's maximal data size is multiple of \texttt{PAGE\_SIZE}. At the
 first glance 4-8 pages is sane amount, although this depends on several
@@ -204,11 +204,11 @@
 information at once.
 \end{itemize}
 
-Concequently, the maximal size of the data node's data has to be configurable.
+Consequently, the maximal size of the data node's data has to be configurable.
 
 To facilitate efficient data node reading we propose to use the method which is
 exploited by ZISOFS filesystem. Basically, the idea is to read not exactly one
-page in the \texttt{read\_page()} operation, but bore pages thus propogating the
+page in the \texttt{read\_page()} operation, but bore pages thus propagating the
 Page Cache. By another words, if the data node being read contains several RAM
 pages of data we might read them all into the Page Cache at once. With this we
 will still have good read performance since we don't need to introduce
@@ -266,7 +266,7 @@
 because of its header;
 
 \item JFFS2 needs more RAM to keep track of nodes; this is the worst
-drawback, see (TODO) for more information about this JFFS2 and JFFS3.
+drawback, see (\ref{ref_JFFS2_and_mem}) for more information about this.
 \end{itemize}
 
 %
@@ -274,23 +274,164 @@
 %
 \section{Inode Checkpoints}
 
-\subsection{JFFS2 and memory consumption problem}
+\subsection{JFFS2 and memory consumption problem} \label{ref_JFFS2_and_mem}
 TODO.
 
-Work is in progress. Take a glimpse at\\
+Take a glimpse at\\
 \url{http://lists.infradead.org/pipermail/linux-mtd/2005-January/011671.html}
 for the description of the problem for now.
 
 %
+% DATA STRUCTURES
+%
+\section{Data Structures}
+
+%
+% ALGORITHMS
+%
+\section{Algorithms}
+This section describes the main JFFS3 algorithms in pseudo-code. The
+description doesn't contain all the details but refers most fundamental
+steps.
+
+\subsection{Inode build}
+The algorithm is initiated by the \texttt{iget()} VFS call. From the user's
+perspective it happens e.g. on \texttt{stat()} and \texttt{open()} calls.
+
+\begin{verbatim}
+For each node_ref
+{
+  read the corresponding node from the flash;
+  If the inode is the directory inode
+  {
+    insert the dirent node to dentree.
+  }
+  Else if the inode is the regular file inode
+  {
+    insert the data node to fragtree.
+  }
+  Else if the inode is the symlink inode
+  {
+    Read the symlink inode node and cache the symlink's target;
+  }
+  Else read the metadata for another types of the inode as well;
+}
+
+If the inode is the directory inode or the regular file inode
+{
+  For each icp_ref
+  {
+    read the corresponding ICP from the flash;
+    For each node in this ICP
+    {
+      If the inode is the directory inode
+      {
+        insert the dirent node to dentree.
+      }
+      Else if the inode is the regular file inode
+      {
+        insert the data node to fragtree.
+      }
+    }
+  }
+}
+\end{verbatim}
+
+Notes:
+\begin{itemize}
+\item Since the nodes which aren't referred by any ICP are younger (have
+the higher version) then those which refereed, we start building fragtree
+from them. In this case, we will have fewer RB-tree balancing
+operations.
+
+\item ICP entries are sorted by versions in ICPs and they must be
+processed in the descended version order.
+\end{itemize}
+
+\subsection{Garbage Collection}
+\begin{verbatim}
+Choose the vblock to GC (gcblock);
+Read the gcblock's summary;
+
+For each node in the gcblock
+{
+  If node is valid
+  {
+    move the node;
+    If the node is described by an ICP
+    {
+      Obsolete the corresponding ICP entry in the ICP;
+    }
+  }
+}
+\end{verbatim}
+
+Notes:
+\begin{itemize}
+\item See [\ref{ref_Node_Valid}] for the algorithm of checking whether a
+node is valid;
+
+\item See [\ref{ref_Move_Node}] for the algorithm of a node moving;
+
+\item See [\ref{ref_Obsolete_Entry}] for the algorithm of how to
+obsolete an ICP entry in the ICP.
+\end{itemize}
+
+\subsubsection{Check if node is valid} \label{ref_Node_Valid}
+Each vblock has an associated list of its obsolete nodes. To check if
+a vblock's node is valid the list should be scanned.
+
+\subsubsection{Move a node} \label{ref_Move_Node}
+TODO
+
+\subsection{Obsolete an ICP entry} \label{ref_Obsolete_Entry}
+An ICP is associated with the icp\_ref. One inode may have more then on
+ICP.
+
+An ICP covers some version range and includes ICP entries referring
+\emph{all} the \emph{valid} nodes within that range.
+
+An icp\_ref contains a counter of valid nodes that it refers. When one
+of its nodes becomes obsolete, the counter is decremented. When the
+number of valid nodes referred by the ICP becomes too small, the ICP is
+treated as obsolete and is removed from the inode's icp\_ref list.
+
+\begin{verbatim}
+Decrement the icp_ref's valid nodes count;
+
+If the count becomes too small
+{
+  For each valid node in the icp
+  {
+    Allocate the node_ref for it;
+    Insert this node_ref to the node_refs list;
+  }
+
+  Remove the icp_ref from the icp_ref list;
+  Add the icp_ref's node_ref to the list of obsolete nodes;
+}
+\end{verbatim}
+
+\subsection{Flash scan}
+
+\subsection{FS build}
+
+\subsection{Memory shrink}
+TODO
+
+
+%
 % ABBREVIATIONS
 %
 \section{Abbreviations}
 \begin{enumerate}
 \item \emph{BB} - Bad Block
 \item \emph{CRC} - Cyclic Redundancy Check
+\item \emph{ICP} - Inode Check-Point
 \item \emph{JFFS2} - Journalling Flash File System version 2
 \item \emph{JFFS3} - Journalling Flash File System version 3
-\item \emph{MTD} - Memory Technoligy Devices
+\item \emph{MTD} - Memory Technology Devices
+\item \emph{VFS} - Virtual File System
 \end{enumerate}
 
 %
@@ -310,6 +451,11 @@
 
 \item \emph{Data node} -- the inode node relating to file, not to directory.
 
+\item \emph{Dentree, directory entry tree} -- TODO. Not designed yet.
+The Idea is to keep 
+direntries of the directory in a sort of balanced tree, not in a list as
+it is done in JFFS2.
+
 \item \emph{Direntry node} -- node representing JFFS3 directory entry.
 Each JFFS3 directory entry is represented by one valid direntry node.
 
@@ -319,32 +465,53 @@
 differently, the dirty space is comprised of obsolete nodes and
 paddings.
 
+\item \emph{Fragtree, fragment tree} -- a Red-Black tree of the file
+fragments scattered over JFFS3 data nodes. Used to quickly lookup which
+nodes ought to be read in order to get the needed file data range.
+
 \item \emph{Garbage Collector} -- very important part of JFFS3
 design designated to reclaim dirt. Namely, GC moves
 valid nodes from dirty vblocks to newly erased vblocks reclaiming the space
 occupied by dirt. Another substantial aspect GC is responsible for is
 the wear-levelling support and the production of pristine nodes.
 
+\item \emph{Gcblock} -- the block which is currently being Garbage
+Collected, i.e., the valid nodes are moved from this block by the GC.
+
 \item \emph{Node} --
 basic JFFS3 data structure. Anything that JFFS3 stores on flash is
 stored in form of node(s). There are different types of nodes defined.
 
 \item \emph{Node header} --
-node and the relative data structure largely consists of two parts - header and
+nodes and the relating data structures largely consist of two parts - header and
 data. Node header contains internal information while node data contains some
 user-visible data like file's content or direntry's name.
 
-\item \emph{Non-pristine node} -- a type of data node which containes the amount
+\item \emph{Non-pristine node} -- a type of data node which contains the amount
 of uncompressed data which is not multiple of \texttt{PAGE\_SIZE} bytes.
 
+i\item \emph{ICP entry} - a short record in ICP referring a node.
+
+\item \emph{Icp\_ref} - a small object representing an ICP. Belongs to
+the in-core objects group.
+
 \item \emph{In-core memory, in-core RAM} -- the memory that is consumed by
-node\_ref objects.
+node\_ref, inode\_ref and icp\_ref objects.
+
+\item \emph{Inode\_ref} - a small object representing an inode (of any
+type). Belongs to the in-core objects group.
+
+\item \emph{Inode build} -- a process which is activated by the
+\texttt{iget()} VFS call. The main tasks are: to build the dentree
+for th directory inodes and to build fragtree
+for the regular file inodes.
 
-\item \emph{Inode node} -- node representing JFFS3 inode.
+\item \emph{Inode node} -- a node representing JFFS3 inode.
 Any inode in JFFS3 is represented by one (for directory inodes) or more
 (for file inodes) inode nodes.
 
-\item \emph{Node\_ref} - a small object representing one JFFS3 node (of any type).
+\item \emph{Node\_ref} - a small object representing any JFFS3 node (of any type).
+Belongs to the in-core objects group.
 
 \item \emph{Obsolete node} - a node which does not contain any
 actual information. The exact semantic of obsolete node depends on its