About GC
David Woodhouse
dwmw2 at infradead.org
Fri Sep 13 03:59:31 EDT 2002
(redirected to jffs list)
startec at ms11.hinet.net said:
> The recent CVS code has a great improvement at mounting time. It's
> great. I test it with the 32Mbytes NAND flash and the mounting time
> reduce to 10 seconds(the original time is 50 seconds).
We can probably do better than that. I think we're still not page-aligning
our reads during scan.
> After mounting, I found that the GC thread will take the most CPU
> time(99.9) in my system for a while. How can I make
> jffs2_garbage_collection_pass to reduce CPU time?
(not really answering the question but I've written it now...)
Well, telling me it takes 99% CPU time isn't wonderfully useful. What's
more useful is telling me _what_ it's doing. But as it happens, I was
looking at that yesterday. http://www.infradead.org/~dwmw2/holey-profile
is a profile run from about a couple of minutes of GC-intensive writes on a
fs which is about 80% full.
We already have code to mark nodes as 'pristine' when they can be copied
intact without having to iget the inode to which they belong and then read
and rewrite the data. That will help a lot with memory usage (far less
thrashing of icache) and allow us to remove the zlib traces from the
profile. (You don't see the read_inode time in the trace because the icache
was already fully populated with _every_ inode in the fs before I started).
However, the amount of time spent in zlib decompressing and then
recompressing each node we GC isn't actually as much as I thought it was.
We could possibly get 10% improvement when we finish that code and make the
GC use it, but not a lot more, AFAICT.
The vast majority of the time is spent in __delay, which will have been
used from the erase routine. The logic there is "if(need_resched()) do_so()
else udelay()" so on an unloaded system it will hog your CPU and check more
frequently for completion than once per jiffie, but if there's other stuff
to run it'll be kinder.
I don't think there's anything I can do there locally -- we're waiting for
hardware. What we need to do is ensure that we erase less. At the moment,
we have a single block to which we are currently writing. GC'd nodes get
written there mixed up with new nodes with writes from the user. The former
has a high probability of being static long-lived data, while the latter is
more likely to be volatile. The result is that we tend to end up with a lot
of erase blocks which are about half-full of long-lived data and half
dirty. for each pair of those, what we _want_ is a completely full clean
one and a completely dirty one.
We can probably get much closer to that ideal by splitting up the writes.
If we have two blocks 'on the go' at a time, one of which is taking new
writes from the user, the other of which is taking GC'd nodes from elsewhere
with older data, we will tend to group clean and dirty stuff more usefully,
and hence have to do less erasing and copying to make progress when we come
to GC.
We already have separate allocation routines for GC writes anyway, for other
reasons, so implementing this shouldn't be too painful. It's just a case of
convincing myself it's actually going to be worth it and getting round to it
-- as ever, in the absence of customers causing my boss to schedule my time
for it, it has to wait till I'm sufficiently disgusted by what I'm
_supposed_ to be working on that I steal enough cycles to play with it.
--
dwmw2
More information about the linux-mtd
mailing list