mtd/fs/jffs3 JFFS3design.tex,1.47,1.48

Sun Apr 24 07:50:34 EDT 2005

Update of /home/cvs/mtd/fs/jffs3
In directory phoenix.infradead.org:/tmp/cvs-serv17477

Modified Files:
	JFFS3design.tex 
Log Message:
Update Directory checkpoint chapter

Index: JFFS3design.tex
===================================================================
RCS file: /home/cvs/mtd/fs/jffs3/JFFS3design.tex,v
retrieving revision 1.47
retrieving revision 1.48
diff -u -r1.47 -r1.48

--- JFFS3design.tex	23 Apr 2005 17:46:24 -0000	1.47
+++ JFFS3design.tex	24 Apr 2005 11:50:31 -0000	1.48
@@ -15,6 +15,9 @@
 \topmargin=0cm
 \textheight=24cm
 
+% Define TODO command
+\newcommand{\TODO}[1]{({\textbf{TODO}: #1})\marginpar{\large \textbf{!?!}}}
+
 \begin{document}
 
 %
@@ -199,10 +202,10 @@
 entries in it) is browsed, more RAM is needed.
 
 \item[Peak RAM usage.] The JFFS2 memory consumption also depends on how data is
-written. Each transaction goes directly to Flash (through the
-\mbox{write-buffer} for \mbox{page-based} Flashes like NAND), i.e., there is
+written. Each transaction goes directly to flash (through the
+\mbox{write-buffer} for \mbox{page-based} flashes like NAND), i.e., there is
 \mbox{Write-Through} cache in Linux/JFFS2. Consequently, small transactions
-result in a great deal of small nodes on Flash and hence, much memory is required,
+result in a great deal of small nodes on flash and hence, much memory is required,
 for \mbox{in-core} objects and fragtree. Later small nodes will be merged by
 GC (maximum \texttt{PAGE\_SIZE} bytes of a files' data may be stored in
 one node) and the amount of consumed memory will be much lower. But the peak
@@ -215,12 +218,12 @@
 \subsection{Mount time}
 The slow mount is the most prominent and upsetting JFFS2 feature. The
 reason is, again, the JFFS2 design which doesn't assume any definite
-structure on the Flash media but instead, makes use of one big log made
+structure on the flash media but instead, makes use of one big log made
 up of nodes -- the only JFFS2 \mbox{on-flash} data structure. 
 But unfortunately, there are several drawbacks, for example -- slow mount.
-Because of absence of any Flash structure, JFFS2 needs to scan the whole
-Flash partition to identify all nodes and to build the file system map. This
-takes much time, especially on large Flash partitions.
+Because of absence of any flash structure, JFFS2 needs to scan the whole
+flash partition to identify all nodes and to build the file system map. This
+takes much time, especially on large flash partitions.
 
 To increase the mount speed JFFS2 performs as few work as possible during
 the mount process and defers
@@ -242,7 +245,7 @@
 writers for a considerable time interval. 
 \end{description}
 
-TODO: show some real numbers here.
+\TODO{show some real numbers here}
 
 The first problem may be solved by a patch referred to as a "summary patch" and
 accessible at \url{http://www.inf.u-szeged.hu/jffs2/}.
@@ -266,7 +269,7 @@
 build time is the linear function of the nodes number.
 \end{description}
 
-TODO: provide some real numbers here
+\TODO{provide some real numbers here}
 
 \subsection{Conclusion}
 Historically, JFFS2 was designed for small NOR flash chips and didn't even
@@ -278,7 +281,7 @@
 space as it actually need, without wasting much space as in case of
 "traditional" file systems for block devices;
 \item admitting of very efficient utilizing of "on-flight" compression which
-allows to fit a big deal of data to Flash;
+allows to fit a big deal of data to flash;
 \item quick read and write operations;
 \item natural unclean reboot robustness;
 \item good wear-leveling.
@@ -298,7 +301,7 @@
 
 \begin{enumerate}
 \item Fully POSIX-compatible file system.
-\item The main OS is Linux (TODO: nonetheless portable?).
+\item The main OS is Linux. \TODO{nonetheless portable?}
 \item Good scalability.
 \item Preserve the possibility to efficiently use compression (like in JFFS2).
 \item Provide good wear-leveling.
@@ -306,7 +309,7 @@
 \item Fast mounting.
 \item Reasonably low memory consumption.
 \item Fast enough read/write operations.
-\item Support different Flash types (NOR, NAND, etc).
+\item Support different flash types (NOR, NAND, etc).
 \item Support xattr.
 \end{enumerate}
 
@@ -330,7 +333,7 @@
 \end{itemize}
 
 %
-% Data checkpoints
+% File checkpoints
 %
 \subsection{File checkpoints}
 File checkpoints (FCP):
@@ -378,7 +381,7 @@
 
 struct jffs3_raw_fcp_entry
 {
-  uint32_t phys_offs; /* The position of the node on Flash. */
+  uint32_t phys_offs; /* The position of the node on flash. */
   uint32_t offs;      /* Offset of the data range the node refers. */
   uint16_t len;       /* The length of the node data range. */
 } __attribute__((packed));
@@ -404,14 +407,14 @@
 
 \item[FCP node size.] FCP node physical size shouldn't be
 neither too large nor too small. To facilitate faster FCP update on NAND
-flashes the maximum FCP node physical size ($S_{FCP}$) should be one NAND
-page (or less).
+flashes the maximum FCP node physical size ($S_{FCP}$) should be one
+flash page (or less).
 
 Since one FCP entry takes 10 bytes, the FCP node header takes 20 bytes
 the maximum number of FCP entries $E_{FCP}$ will be:
 \begin{center}
 \begin{tabular}{ll}
-\textbf{NAND page size} & $\bf E_{FCP}$ \\
+\textbf{Flash page size} & $\bf E_{FCP}$ \\
 \hline
 512 bytes  & 49\\
 2048 bytes & 202\\
@@ -426,9 +429,9 @@
 $$R_{FCP} > \mathtt{PAGE\_SIZE}{\cdot}E_{FCP},$$
 where $E_{FCP}$ is the
 maximal number of FCP entries in FCP. The table below illustrates
-the average data node range size depending on $R$ and NAND page size.
+the average data node range size depending on $R$ and flash page size.
 
-512 bytes NAND page:
+512 bytes flash page:
 \begin{center}
 \begin{tabular}{ll}
 $\bf R_{FCP}$ & \textbf{Average $\bf R_{data}$}\\
@@ -438,7 +441,7 @@
 \end{tabular}
 \end{center}
 
-2048 bytes NAND page:
+2048 bytes flash page:
 \begin{center}
 \begin{tabular}{ll}
 $\bf R_{FCP}$ & \textbf{Average $\bf R_{data}$}\\
@@ -449,7 +452,7 @@
 \end{tabular}
 \end{center}
 
-4096 bytes NAND page:
+4096 bytes flash page:
 \begin{center}
 \begin{tabular}{ll}
 $\bf R_{FCP}$ & \textbf{Average $\bf R_{data}$}\\
@@ -460,7 +463,7 @@
 \end{tabular}
 \end{center}
 
-The above tables assumei that each FCP entry may at most refer 4~KiB of data.
+The above tables assumes that each FCP entry may at most refer 4~KiB of data.
 
 For NOR and other non-paged flashes JFFS3 just assumes that
 the flash page size is virtually 512 bytes or larger, depending on the
@@ -468,22 +471,22 @@
 \end{description}
 
 If we keep in-core references of all the FCP nodes of a file, we may
-waste a lot ot RAM. Even assuming that each reference takes 4~bytes
+waste a lot of RAM. Even assuming that each reference takes 4~bytes
 (which is in fact unreachably few) we have:
 \begin{center}
 \begin{tabular}{lllll}
-\textbf{File size} & \textbf{NAND Page size} &
+\textbf{File size} & \textbf{Flash Page size} &
 $\bf R_{FCP}$ & \textbf{RAM required} &
 \textbf{Flash overhead}\\
 \hline
 64 MiB  & 512 bytes  & 128 KiB & 2 MiB  & 256 KiB (0.4\%)\\
 1 GiB   & 2048 bytes & 512 KiB & 8 MiB  & 4 MiB (0.4\%)\\
-8 GiB   & 4096 bytes & 1 MiB   & 32 MiB & 16 MiB (0.4\%)\\
+4 GiB   & 4096 bytes & 1 MiB   & 16 MiB & 8 MiB (0.4\%)\\
 \end{tabular}
 \end{center}
 
-The above table also shows the amount of Flash space required to store
-FCP nodes (assuming each FCP entry takes the whole Flash page).
+The above table also shows the amount of flash space required to store
+FCP nodes (assuming each FCP entry takes the whole flash page).
 
 To relax RAM usage JFFS3 makes use of \emph{second level file
 checkpoints} (FCP2) referring
@@ -493,19 +496,19 @@
 \begin{itemize}
 \item FCP2 range $R_{FCP2}$ is multiple of FCP range $R_{FCP}$;
 \item the largest FCP2 node physical size $S_{FCP2}$ is limited by the
-NAND page size;
+Flash page size;
 \end{itemize}
 
 With FCP2 the above table looks as:
 \begin{center}
 \begin{tabular}{lllll}
-\textbf{File size} & \textbf{NAND Page size} &
+\textbf{File size} & \textbf{Flash Page size} &
 $\bf R_{FCP}$ & $\bf R_{FCP2}$
 & \textbf{RAM required}\\
 \hline
 64 MiB  & 512 bytes  & 256 KiB & 8 MiB   & 32 bytes\\
 1 GiB   & 2048 bytes & 1 MiB   & 128 MiB & 32 bytes\\
-8 GiB   & 4096 bytes & 2 MiB   & 512 MiB & 64 bytes\\
+4 GiB   & 4096 bytes & 1 MiB   & 256 MiB & 64 bytes\\
 \end{tabular}
 \end{center}
 
@@ -513,6 +516,9 @@
 denoting the minimal size of file which can have an
 associated FCP2 node.
 
+%
+% Directory checkpoints
+%
 \subsection{Directory checkpoints}
 Directory checkpoints (DCP):
 \begin{itemize}
@@ -529,8 +535,8 @@
   uint16_t magic;   /* Magic bitmask of DCP node. */
   uint64_t version; /* Helps to differentiate between valid
                        and obsolete directory checkpoints. */
-  uint32_t hstart;  /* Starting value of the DCP hash range. */
-  uint32_t hend;    /* Ending value of the DCP hash range. */
+  uint32_t hstart;  /* Starting value of the DCP hash diapason. */
+  uint32_t hend;    /* Ending value of the DCP hash diapason. */
   uint32_t hdr_crc; /* DCP Header CRC32 checksum. */
   uint32_t crc;     /* DCP payload CRC32 checksum. */
   
@@ -551,6 +557,86 @@
 $H$ mapping directory entry names to 32-bit integers. In case of good
 $H$ the probability of name collisions will be $1/2^{32}$.
 
+It makes sense to use CRC32 as the hash function due to the following
+reasons:
+\begin{itemize}
+\item CRC32 is certanly good hash function;
+\item directory entry names on flash are protected by CRC32 and we
+would calculate CRC32 anyway when we write the directory entry.
+\end{itemize}
+
+To refer Directory checkpoints JFFS3 ought to keep in-core:
+\begin{itemize}
+\item DCP offset on flash;
+\item Starting hash diapason value;
+\item Ending hash diapason value;
+\end{itemize}
+
+To find DCP which refers the direntry node with a given name JFFS3
+performs the following steps:
+\begin{itemize}
+\item calculates the hash value of the given name;
+\item looks for DCP with the hash diapason that includes the calculated
+hash value;
+\item reads the DCP node and finds out the directory entry node offset;
+\item reads the direntry node.
+\end{itemize}
+
+Unlike FCP, directory checkpoints do not have any criteria to split them
+on a fixed basis. Instead, DCP may refer any hash diapason, starting
+from \mbox{\texttt{0x00000000}-\texttt{0xFFFFFFFF}} and ending with a
+diapason including only one hash value (if there are too many
+hash collisions). This means that JFFS3 dynamically splits and merges
+DCP diapasons when new direntries are created and old direntries are
+deleted or changed.
+
+The following constants are defined for directory checkpoints:
+\begin{enumerate}
+\item $E_{DCP}^{min}$ - the minimal number of DCP entries DCP may
+refer;
+\item $E_{DCP}^{max}$ - the maximal number of DCP entries DCP may refer;
+\end{enumerate}
+
+DCP nodes physical size is restricted by the flash page size. Hence,
+assuming $E_{DCP}^{min} = E_{DCP}^{max}/3$,
+$E_{DCP}^{min}$ and $E_{DCP}^{max}$ for different flash page sizes are:
+
+\begin{center}
+\begin{tabular}{lll}
+\textbf{Flash page size} & $\bf E_{DCP}^{min}$ & $\bf E_{DCP}^{max}$\\
+\hline
+512 bytes  & 20  & 60\\
+2048 bytes & 84  & 252\\
+4096 bytes & 169 & 508\\
+\end{tabular}
+\end{center}
+
+The JFFS3 DCP merging/splitting algorithm works as follows:
+\begin{itemize}
+\item as long as there are few direntries in directory (i.e less then
+$E_{DCP}^{max}$), JFFS3 maintains only on DCP comprising whole
+\mbox{32-bit} hash diapason;
+
+\item when more direntries are added, JFFS3 creates more directory
+checkpoints splitting the hash diapason;
+
+\item JFFS3 inserts direntries to DCP with the correspondent hash
+diapason; when the number of associated direntries of the DCP becomes
+too large (i.e., greater then $E_{DCP}^{max}$), the DCP is split on two
+directory checkpoints and the hash diapason is split correspondingly;
+
+\item when directory entries are deleted the DCP which correspond to
+their diapasons becomes smaller; when DCP becomes too small (i.e., the
+number of direntry nodes it refers becomes less then $E_{DCP}^{min}$)
+JFFS3 either merges it with one of small neighbors or splits one of the
+neighbors and then merges the two adjacent small diapasons.
+\end{itemize}
+
+When several direntries have the colliding $H$ values, JFFS3 puts them
+to the same diaposon. If the number of colliding direntries becomes
+equvalent to $E_{DCP}^{max}$, JFFS3 will refuse adding one more direntry
+with the colliding hash value. This implies there is a restriction on the number
+of direntries in a directory.
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %
@@ -955,8 +1041,7 @@
 \item each data node involves some additional flash space overhead
 because of its header;
 
-\item JFFS2 needs more RAM to keep track of nodes; this is the worst
-drawback, see (\ref{ref_JFFS2_and_mem}) for more information about this.
+\item JFFS2 needs more RAM to keep track of nodes;
 \end{itemize}
 
 
@@ -1135,9 +1220,8 @@
 %
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \section{Credits}
-David Woodhouse\\
 Joern Engel\\
-Thomas Gleixner\\
+David Woodhouse\\
 Josh Boyer\\
 Ferenc Havasi\\
 Joakim Tjernlund\\