mtd/fs/jffs3 JFFS3design.tex,1.47,1.48
Artem Bityuckiy
dedekind at infradead.org
Sun Apr 24 07:50:34 EDT 2005
Update of /home/cvs/mtd/fs/jffs3
In directory phoenix.infradead.org:/tmp/cvs-serv17477
Modified Files:
JFFS3design.tex
Log Message:
Update Directory checkpoint chapter
Index: JFFS3design.tex
===================================================================
RCS file: /home/cvs/mtd/fs/jffs3/JFFS3design.tex,v
retrieving revision 1.47
retrieving revision 1.48
diff -u -r1.47 -r1.48
--- JFFS3design.tex 23 Apr 2005 17:46:24 -0000 1.47
+++ JFFS3design.tex 24 Apr 2005 11:50:31 -0000 1.48
@@ -15,6 +15,9 @@
\topmargin=0cm
\textheight=24cm
+% Define TODO command
+\newcommand{\TODO}[1]{({\textbf{TODO}: #1})\marginpar{\large \textbf{!?!}}}
+
\begin{document}
%
@@ -199,10 +202,10 @@
entries in it) is browsed, more RAM is needed.
\item[Peak RAM usage.] The JFFS2 memory consumption also depends on how data is
-written. Each transaction goes directly to Flash (through the
-\mbox{write-buffer} for \mbox{page-based} Flashes like NAND), i.e., there is
+written. Each transaction goes directly to flash (through the
+\mbox{write-buffer} for \mbox{page-based} flashes like NAND), i.e., there is
\mbox{Write-Through} cache in Linux/JFFS2. Consequently, small transactions
-result in a great deal of small nodes on Flash and hence, much memory is required,
+result in a great deal of small nodes on flash and hence, much memory is required,
for \mbox{in-core} objects and fragtree. Later small nodes will be merged by
GC (maximum \texttt{PAGE\_SIZE} bytes of a files' data may be stored in
one node) and the amount of consumed memory will be much lower. But the peak
@@ -215,12 +218,12 @@
\subsection{Mount time}
The slow mount is the most prominent and upsetting JFFS2 feature. The
reason is, again, the JFFS2 design which doesn't assume any definite
-structure on the Flash media but instead, makes use of one big log made
+structure on the flash media but instead, makes use of one big log made
up of nodes -- the only JFFS2 \mbox{on-flash} data structure.
But unfortunately, there are several drawbacks, for example -- slow mount.
-Because of absence of any Flash structure, JFFS2 needs to scan the whole
-Flash partition to identify all nodes and to build the file system map. This
-takes much time, especially on large Flash partitions.
+Because of absence of any flash structure, JFFS2 needs to scan the whole
+flash partition to identify all nodes and to build the file system map. This
+takes much time, especially on large flash partitions.
To increase the mount speed JFFS2 performs as few work as possible during
the mount process and defers
@@ -242,7 +245,7 @@
writers for a considerable time interval.
\end{description}
-TODO: show some real numbers here.
+\TODO{show some real numbers here}
The first problem may be solved by a patch referred to as a "summary patch" and
accessible at \url{http://www.inf.u-szeged.hu/jffs2/}.
@@ -266,7 +269,7 @@
build time is the linear function of the nodes number.
\end{description}
-TODO: provide some real numbers here
+\TODO{provide some real numbers here}
\subsection{Conclusion}
Historically, JFFS2 was designed for small NOR flash chips and didn't even
@@ -278,7 +281,7 @@
space as it actually need, without wasting much space as in case of
"traditional" file systems for block devices;
\item admitting of very efficient utilizing of "on-flight" compression which
-allows to fit a big deal of data to Flash;
+allows to fit a big deal of data to flash;
\item quick read and write operations;
\item natural unclean reboot robustness;
\item good wear-leveling.
@@ -298,7 +301,7 @@
\begin{enumerate}
\item Fully POSIX-compatible file system.
-\item The main OS is Linux (TODO: nonetheless portable?).
+\item The main OS is Linux. \TODO{nonetheless portable?}
\item Good scalability.
\item Preserve the possibility to efficiently use compression (like in JFFS2).
\item Provide good wear-leveling.
@@ -306,7 +309,7 @@
\item Fast mounting.
\item Reasonably low memory consumption.
\item Fast enough read/write operations.
-\item Support different Flash types (NOR, NAND, etc).
+\item Support different flash types (NOR, NAND, etc).
\item Support xattr.
\end{enumerate}
@@ -330,7 +333,7 @@
\end{itemize}
%
-% Data checkpoints
+% File checkpoints
%
\subsection{File checkpoints}
File checkpoints (FCP):
@@ -378,7 +381,7 @@
struct jffs3_raw_fcp_entry
{
- uint32_t phys_offs; /* The position of the node on Flash. */
+ uint32_t phys_offs; /* The position of the node on flash. */
uint32_t offs; /* Offset of the data range the node refers. */
uint16_t len; /* The length of the node data range. */
} __attribute__((packed));
@@ -404,14 +407,14 @@
\item[FCP node size.] FCP node physical size shouldn't be
neither too large nor too small. To facilitate faster FCP update on NAND
-flashes the maximum FCP node physical size ($S_{FCP}$) should be one NAND
-page (or less).
+flashes the maximum FCP node physical size ($S_{FCP}$) should be one
+flash page (or less).
Since one FCP entry takes 10 bytes, the FCP node header takes 20 bytes
the maximum number of FCP entries $E_{FCP}$ will be:
\begin{center}
\begin{tabular}{ll}
-\textbf{NAND page size} & $\bf E_{FCP}$ \\
+\textbf{Flash page size} & $\bf E_{FCP}$ \\
\hline
512 bytes & 49\\
2048 bytes & 202\\
@@ -426,9 +429,9 @@
$$R_{FCP} > \mathtt{PAGE\_SIZE}{\cdot}E_{FCP},$$
where $E_{FCP}$ is the
maximal number of FCP entries in FCP. The table below illustrates
-the average data node range size depending on $R$ and NAND page size.
+the average data node range size depending on $R$ and flash page size.
-512 bytes NAND page:
+512 bytes flash page:
\begin{center}
\begin{tabular}{ll}
$\bf R_{FCP}$ & \textbf{Average $\bf R_{data}$}\\
@@ -438,7 +441,7 @@
\end{tabular}
\end{center}
-2048 bytes NAND page:
+2048 bytes flash page:
\begin{center}
\begin{tabular}{ll}
$\bf R_{FCP}$ & \textbf{Average $\bf R_{data}$}\\
@@ -449,7 +452,7 @@
\end{tabular}
\end{center}
-4096 bytes NAND page:
+4096 bytes flash page:
\begin{center}
\begin{tabular}{ll}
$\bf R_{FCP}$ & \textbf{Average $\bf R_{data}$}\\
@@ -460,7 +463,7 @@
\end{tabular}
\end{center}
-The above tables assumei that each FCP entry may at most refer 4~KiB of data.
+The above tables assumes that each FCP entry may at most refer 4~KiB of data.
For NOR and other non-paged flashes JFFS3 just assumes that
the flash page size is virtually 512 bytes or larger, depending on the
@@ -468,22 +471,22 @@
\end{description}
If we keep in-core references of all the FCP nodes of a file, we may
-waste a lot ot RAM. Even assuming that each reference takes 4~bytes
+waste a lot of RAM. Even assuming that each reference takes 4~bytes
(which is in fact unreachably few) we have:
\begin{center}
\begin{tabular}{lllll}
-\textbf{File size} & \textbf{NAND Page size} &
+\textbf{File size} & \textbf{Flash Page size} &
$\bf R_{FCP}$ & \textbf{RAM required} &
\textbf{Flash overhead}\\
\hline
64 MiB & 512 bytes & 128 KiB & 2 MiB & 256 KiB (0.4\%)\\
1 GiB & 2048 bytes & 512 KiB & 8 MiB & 4 MiB (0.4\%)\\
-8 GiB & 4096 bytes & 1 MiB & 32 MiB & 16 MiB (0.4\%)\\
+4 GiB & 4096 bytes & 1 MiB & 16 MiB & 8 MiB (0.4\%)\\
\end{tabular}
\end{center}
-The above table also shows the amount of Flash space required to store
-FCP nodes (assuming each FCP entry takes the whole Flash page).
+The above table also shows the amount of flash space required to store
+FCP nodes (assuming each FCP entry takes the whole flash page).
To relax RAM usage JFFS3 makes use of \emph{second level file
checkpoints} (FCP2) referring
@@ -493,19 +496,19 @@
\begin{itemize}
\item FCP2 range $R_{FCP2}$ is multiple of FCP range $R_{FCP}$;
\item the largest FCP2 node physical size $S_{FCP2}$ is limited by the
-NAND page size;
+Flash page size;
\end{itemize}
With FCP2 the above table looks as:
\begin{center}
\begin{tabular}{lllll}
-\textbf{File size} & \textbf{NAND Page size} &
+\textbf{File size} & \textbf{Flash Page size} &
$\bf R_{FCP}$ & $\bf R_{FCP2}$
& \textbf{RAM required}\\
\hline
64 MiB & 512 bytes & 256 KiB & 8 MiB & 32 bytes\\
1 GiB & 2048 bytes & 1 MiB & 128 MiB & 32 bytes\\
-8 GiB & 4096 bytes & 2 MiB & 512 MiB & 64 bytes\\
+4 GiB & 4096 bytes & 1 MiB & 256 MiB & 64 bytes\\
\end{tabular}
\end{center}
@@ -513,6 +516,9 @@
denoting the minimal size of file which can have an
associated FCP2 node.
+%
+% Directory checkpoints
+%
\subsection{Directory checkpoints}
Directory checkpoints (DCP):
\begin{itemize}
@@ -529,8 +535,8 @@
uint16_t magic; /* Magic bitmask of DCP node. */
uint64_t version; /* Helps to differentiate between valid
and obsolete directory checkpoints. */
- uint32_t hstart; /* Starting value of the DCP hash range. */
- uint32_t hend; /* Ending value of the DCP hash range. */
+ uint32_t hstart; /* Starting value of the DCP hash diapason. */
+ uint32_t hend; /* Ending value of the DCP hash diapason. */
uint32_t hdr_crc; /* DCP Header CRC32 checksum. */
uint32_t crc; /* DCP payload CRC32 checksum. */
@@ -551,6 +557,86 @@
$H$ mapping directory entry names to 32-bit integers. In case of good
$H$ the probability of name collisions will be $1/2^{32}$.
+It makes sense to use CRC32 as the hash function due to the following
+reasons:
+\begin{itemize}
+\item CRC32 is certanly good hash function;
+\item directory entry names on flash are protected by CRC32 and we
+would calculate CRC32 anyway when we write the directory entry.
+\end{itemize}
+
+To refer Directory checkpoints JFFS3 ought to keep in-core:
+\begin{itemize}
+\item DCP offset on flash;
+\item Starting hash diapason value;
+\item Ending hash diapason value;
+\end{itemize}
+
+To find DCP which refers the direntry node with a given name JFFS3
+performs the following steps:
+\begin{itemize}
+\item calculates the hash value of the given name;
+\item looks for DCP with the hash diapason that includes the calculated
+hash value;
+\item reads the DCP node and finds out the directory entry node offset;
+\item reads the direntry node.
+\end{itemize}
+
+Unlike FCP, directory checkpoints do not have any criteria to split them
+on a fixed basis. Instead, DCP may refer any hash diapason, starting
+from \mbox{\texttt{0x00000000}-\texttt{0xFFFFFFFF}} and ending with a
+diapason including only one hash value (if there are too many
+hash collisions). This means that JFFS3 dynamically splits and merges
+DCP diapasons when new direntries are created and old direntries are
+deleted or changed.
+
+The following constants are defined for directory checkpoints:
+\begin{enumerate}
+\item $E_{DCP}^{min}$ - the minimal number of DCP entries DCP may
+refer;
+\item $E_{DCP}^{max}$ - the maximal number of DCP entries DCP may refer;
+\end{enumerate}
+
+DCP nodes physical size is restricted by the flash page size. Hence,
+assuming $E_{DCP}^{min} = E_{DCP}^{max}/3$,
+$E_{DCP}^{min}$ and $E_{DCP}^{max}$ for different flash page sizes are:
+
+\begin{center}
+\begin{tabular}{lll}
+\textbf{Flash page size} & $\bf E_{DCP}^{min}$ & $\bf E_{DCP}^{max}$\\
+\hline
+512 bytes & 20 & 60\\
+2048 bytes & 84 & 252\\
+4096 bytes & 169 & 508\\
+\end{tabular}
+\end{center}
+
+The JFFS3 DCP merging/splitting algorithm works as follows:
+\begin{itemize}
+\item as long as there are few direntries in directory (i.e less then
+$E_{DCP}^{max}$), JFFS3 maintains only on DCP comprising whole
+\mbox{32-bit} hash diapason;
+
+\item when more direntries are added, JFFS3 creates more directory
+checkpoints splitting the hash diapason;
+
+\item JFFS3 inserts direntries to DCP with the correspondent hash
+diapason; when the number of associated direntries of the DCP becomes
+too large (i.e., greater then $E_{DCP}^{max}$), the DCP is split on two
+directory checkpoints and the hash diapason is split correspondingly;
+
+\item when directory entries are deleted the DCP which correspond to
+their diapasons becomes smaller; when DCP becomes too small (i.e., the
+number of direntry nodes it refers becomes less then $E_{DCP}^{min}$)
+JFFS3 either merges it with one of small neighbors or splits one of the
+neighbors and then merges the two adjacent small diapasons.
+\end{itemize}
+
+When several direntries have the colliding $H$ values, JFFS3 puts them
+to the same diaposon. If the number of colliding direntries becomes
+equvalent to $E_{DCP}^{max}$, JFFS3 will refuse adding one more direntry
+with the colliding hash value. This implies there is a restriction on the number
+of direntries in a directory.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%
@@ -955,8 +1041,7 @@
\item each data node involves some additional flash space overhead
because of its header;
-\item JFFS2 needs more RAM to keep track of nodes; this is the worst
-drawback, see (\ref{ref_JFFS2_and_mem}) for more information about this.
+\item JFFS2 needs more RAM to keep track of nodes;
\end{itemize}
@@ -1135,9 +1220,8 @@
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Credits}
-David Woodhouse\\
Joern Engel\\
-Thomas Gleixner\\
+David Woodhouse\\
Josh Boyer\\
Ferenc Havasi\\
Joakim Tjernlund\\
More information about the linux-mtd-cvs
mailing list