atomic file operations

Sergei Sharonov sergei.sharonov at halliburton.com
Wed Mar 23 15:50:52 EST 2005


Estelle,

thanks, appreciate your help.

> 
> Sergei Sharonov wrote:
> > Is a write of 1024 bytes atomic?
> > Does it relate to the page size in any way? BTW I am using NAND and the 
> > page may vary between 512 and 2048 bytes depending on a device.
> 
> No write operation is guaranteed to be atomic. Have a look
> at jffs2_write_inode_range in write.c : if there is not enough
> space in the current block for the whole data, it may be split
> into several chunks. Additionally write ops that overlap a
> cache page boundary (not a flash page) are always split at 
> the page limit.

That means that one write may have several CRCs corresponding to 
splinter chunks? 

> If you want to have atomic writes, you could:
> 1) Mandatorily: ensure that your application will not
> issue write ops which overlap a page boundary. 
> You should not tweak the JFFS2 code to write such 
> overlapping nodes, otherwise you must also tweak 
> the GC and it gets difficult.
> 2) Either tweak jffs2_write_inode_range to forbid 
> splitting data which does not overlap a page boundary
> or adjust JFFS2_MIN_DATA_LEN to reserve enough 
> space (difficult to estimate maybe if you have
> compression...).
> 
> The above tweaking should ensure that an input buffer
> is written to JFFS2 FS as a single CRC-protected
> data node.

Ok, got that. Does not seem like a promissing idea considering
how fast jffs2 evolves and therefore how bad forking would be.
Thansk for the suggestion anyway.

> You should be aware that on NAND flash JFFS2 uses
> a (nand flash) page buffer (wbuf.c), which is flushed 
> only on fsync/sync/umount. So even though your write
> ops will be atomic (with above code tweaks), 
> there is no guarantee that a buffer is effectively 
> committed to flash when write() returns, because the
> end of the data node may remain in the buffer.
> If you want that also, you can tweak JFFS2 again 
> by requiring a  wbuf flush after each "atomic write", 
> or you can have your application call fsync after 
> each write.

Beg pardon if it is FAQ, but if I open the file with O_SYNC
flag, wouldn't that guarantee synchronous write that does not
return untill all the data is in flash?

> > Is file rename atomic?
> See jffs2_rename in dir.c. There are two steps:
> make the new hard link, remove the old hard link.
> You may end up with two names for the same inode if
> there is a powerdown, so no it is not atomic.

Could not see that comming. Usualy people assume rename operation
atomic.

> > Second issue is: How badly these small chunks will affect my mount time?
> There have been previous threads about this.
> Some people proposed some (application-side) workaround, 
> you can find it in the archive or maybe someone will point 
> it to you.

I believe I saw a proposal to save small chunks as separate files, then 
append them as a temp file and rename temp file to real log file. 
The problems are (1) the log file is huge (2) rename is not atomic per 
your reply.
 
Sergei Sharonov





More information about the linux-mtd mailing list