[PATCH v7 12/13] ext4: switch to multigrain timestamps

Tue Sep 19 13:46:25 PDT 2023

On Tue, 2023-09-19 at 13:10 -0700, Paul Eggert wrote:
> On 2023-09-19 09:31, Jeff Layton wrote:
> > The typical case for make
> > timestamp comparisons is comparing source files vs. a build target. If
> > those are being written nearly simultaneously, then that could be an
> > issue, but is that a typical behavior?
> 
> I vaguely remember running into problems with 'make' a while ago 
> (perhaps with a BSDish system) when filesystem timestamps were 
> arbitrarily truncated in some cases but not others. These files would 
> look older than they really were, so 'make' would think they were 
> up-to-date when they weren't, and 'make' would omit actions that it 
> should have done, thus screwing up the build.
> 
> File timestamps can be close together with 'make -j' on fast hosts. 
> Sometimes a shell script (or 'make' itself) will run 'make', then modify 
> a file F, then immediately run 'make' again; the latter 'make' won't 
> work if F's timestamp is mistakenly older than targets that depend on it.
> 
> Although 'make'-like apps are the biggest canaries in this coal mine, 
> the issue also affects 'find -newer' (as Bruno mentioned), 'rsync -u', 
> 'mv -u', 'tar -u', Emacs file-newer-than-file-p, and surely many other 
> places. For example, any app that creates a timestamp file, then backs 
> up all files newer than that file, would be at risk.
> 
> 
> > I wonder if it would be feasible to just advance the coarse-grained
> > current_time whenever we end up updating a ctime with a fine-grained
> > timestamp?
> 
> Wouldn't this need to be done globally, that is, not just on a per-file 
> or per-filesystem basis? If so, I don't see how we'd avoid locking 
> performance issues.
> 

Maybe. Another idea might be to introduce a new timekeeper for
multigrain filesystems, but all of those would likely have to share the
same coarse-grained clock source.

So yeah, if you stat an inode and then update it, any inode written on a
multigrain filesystem within the same jiffy-sized window would have to
log an extra transaction to write out the inode. That's what I meant
when I was talking about write amplification.

> 
> PS. Although I'm no expert in the Linux inode code I hope you don't mind 
> my asking a question about this part of inode_set_ctime_current:
> 
> 	/*
> 	 * If we've recently updated with a fine-grained timestamp,
> 	 * then the coarse-grained one may still be earlier than the
> 	 * existing ctime. Just keep the existing value if so.
> 	 */
> 	ctime.tv_sec = inode->__i_ctime.tv_sec;
> 	if (timespec64_compare(&ctime, &now) > 0)
> 		return ctime;
> 
> Suppose root used clock_settime to set the clock backwards. Won't this 
> code incorrectly refuse to update the file's timestamp afterwards? That 
> is, shouldn't the last line be "goto fine_grained;" rather than "return 
> ctime;", with the comment changed from "keep the existing value" to "use 
> a fine-grained value"?

It is a problem, and Linus pointed that out yesterday, which is why I
sent this earlier today:

https://lore.kernel.org/linux-fsdevel/20230919-ctime-v1-1-97b3da92f504@kernel.org/T/#u

Bear in mind that we're not dealing with a situation where the value has
not been queried since its last update, so we don't need to use a fine
grained timestamp there (and really, it's preferable not to do so). A
coarse one should be fine in this case.
-- 
Jeff Layton <jlayton at kernel.org>