atomic file operations

Wed Mar 23 15:50:52 EST 2005

Estelle,

thanks, appreciate your help.

> 
> Sergei Sharonov wrote:
> > Is a write of 1024 bytes atomic?
> > Does it relate to the page size in any way? BTW I am using NAND and the 
> > page may vary between 512 and 2048 bytes depending on a device.
> 
> No write operation is guaranteed to be atomic. Have a look
> at jffs2_write_inode_range in write.c : if there is not enough
> space in the current block for the whole data, it may be split
> into several chunks. Additionally write ops that overlap a
> cache page boundary (not a flash page) are always split at 
> the page limit.

That means that one write may have several CRCs corresponding to 
splinter chunks? 

> If you want to have atomic writes, you could:
> 1) Mandatorily: ensure that your application will not
> issue write ops which overlap a page boundary. 
> You should not tweak the JFFS2 code to write such 
> overlapping nodes, otherwise you must also tweak 
> the GC and it gets difficult.
> 2) Either tweak jffs2_write_inode_range to forbid 
> splitting data which does not overlap a page boundary
> or adjust JFFS2_MIN_DATA_LEN to reserve enough 
> space (difficult to estimate maybe if you have
> compression...).
> 
> The above tweaking should ensure that an input buffer
> is written to JFFS2 FS as a single CRC-protected
> data node.

Ok, got that. Does not seem like a promissing idea considering
how fast jffs2 evolves and therefore how bad forking would be.
Thansk for the suggestion anyway.

> You should be aware that on NAND flash JFFS2 uses
> a (nand flash) page buffer (wbuf.c), which is flushed 
> only on fsync/sync/umount. So even though your write
> ops will be atomic (with above code tweaks), 
> there is no guarantee that a buffer is effectively 
> committed to flash when write() returns, because the
> end of the data node may remain in the buffer.
> If you want that also, you can tweak JFFS2 again 
> by requiring a  wbuf flush after each "atomic write", 
> or you can have your application call fsync after 
> each write.

Beg pardon if it is FAQ, but if I open the file with O_SYNC
flag, wouldn't that guarantee synchronous write that does not
return untill all the data is in flash?

> > Is file rename atomic?
> See jffs2_rename in dir.c. There are two steps:
> make the new hard link, remove the old hard link.
> You may end up with two names for the same inode if
> there is a powerdown, so no it is not atomic.

Could not see that comming. Usualy people assume rename operation
atomic.

> > Second issue is: How badly these small chunks will affect my mount time?
> There have been previous threads about this.
> Some people proposed some (application-side) workaround, 
> you can find it in the archive or maybe someone will point 
> it to you.

I believe I saw a proposal to save small chunks as separate files, then 
append them as a temp file and rename temp file to real log file. 
The problems are (1) the log file is huge (2) rename is not atomic per 
your reply.

Sergei Sharonov