[RFC][PATCH] ubi: Implement a read cache

Mon Jul 18 07:21:41 PDT 2016

Hi Richard,

On Fri, 15 Jul 2016 14:57:34 +0200
Richard Weinberger <richard at nod.at> wrote:

> Implement a simple read cache such that small adjacent reads
> with in the same NAND page can be cached.
> When a read smaller than a NAND page is requested, UBI reads the full
> page and caches it.
> NAND core has already such a cache but it will go away soon.
> 
> To be able to benchmark better, there is also a debugfs file which reports
> cache stats, misses, hits and no_query.
> Misses and hits should be clear, no_query is incremented when the cache
> cannot be used, i.e. for reads larger than a page size.
> 
> This patch is just a PoC.
> 

Just a few comments on the approach. I'll try to review the code later.

> Signed-off-by: Richard Weinberger <richard at nod.at>
> ---
> Hi!
> 
> This is an initial draft for a read cache implemented in UBI.
> 
> The cache sits in ubi_eba_read_leb() and is used when the read length is
> smaller than a NAND page and does not cross a page boundary, this can be
> improved later.

Hm, we should probably use the cache for aligned accesses and
multi-pages read.

> Currently the cache consists of 61 cache lines where the PEB acts
> as selector.

If we go for a multi-pages cache, we should probably use an LRU cache
instead of this 'simple PEB hash + no collision' approach. Remember
that reading a page from NAND is more time consuming than iterating
over a small array of NAND page buffers.

> Maybe less cache lines would also do it, this needs to be benchmarked
> in more detail.

See my comment below.

> 
> I did already some benchmarks.
> The system I've used has a 4GiB MLC NAND with 16KiB page size.
> Userspace is a Debian 8 with systemd.
> 
> The following tasks have been benchmarked so far.
> 1. system bootup (userspace time as reported by systemd-analyze)
> 2. walking /usr, since this involves a lot of TNC lookups this is interesting
> 3. removing a 1GiB file, it involves also a lot of TNC lookups
> 
> Baseline
> ========
> 
> 1. bootup time: 18 seconds
> 
> 2. time find /usr > /dev/null
> real    0m19.869s
> user    0m0.120s
> sys     0m7.300s
> 
> 3. time rm /root/viel
> real    0m32.992s
> user    0m0.000s
> sys     0m12.380s
> 
> While the results for 1 and 2 are okay, we observe that removing the file
> took a rather long time.
> The NAND driver on this target supports subpage reads and therefore the NAND
> page cache was never used.
> Since removing a large file leads to a lot of TNC lookups UBIFS issues many
> reads with a length around 200 bytes and the overhead of every single mtd_read()
> kickes in.
> 
> Baseline + NAND cache (hence, subpage read support disabled in our driver)
> ==========================================================================
> 
> 1. bootup time: 19,5 seconds
> 
> 2. time find /usr > /dev/null
> real    0m24.555s
> user    0m0.150s
> sys     0m8.220s
> 
> 3. time rm /root/viel
> real    0m5.327s
> user    0m0.000s
> sys     0m3.300s
> 
> We observe that the bootup took a little longer, but the jitter here is about one
> second.
> Walking /usr took about 5 seconds longer, not sure why. This needs more inspection.
> 
> The most interesting result is that removing the large file took only 5 seconds,
> compared to 33 seconds this is huge speedup.
> Since the znodes of the same file are adjacent data structures on the flash
> the cache on NAND level helped a lot.
> 
> UBI read cache with 1 cache line
> ================================
> 
> 1. bootup time: 19,5 seconds

Hm, I would expect an attach time close to what we have in Baseline
with subpage read. Do you know where the difference comes from?

> cache usage:
> hits: 1546
> misses: 6887
> no_query: 649
> 
> 
> 2. time find /usr > /dev/null
> real    0m24.297s
> user    0m0.100s
> sys     0m7.800s
> 
> cache usage:
> hits: 4068
> misses: 17669
> no_query: 107
> 
> 3. time rm /root/viel
> real    0m5.457s
> user    0m0.000s
> sys     0m2.730s
> 
> cache usage:
> hits: 34517
> misses: 2490
> no_query: 212
> 
> The results are more or less the same as with caching on the NAND side.
> So, we could drop the NAND cache and have it implemented in UBI.
> As expected the cache brings us most while deleting large files.
> 
> UBI read cache with 61 cache lines
> ==================================
> 
> 1. bootup time: 17,5 seconds
> hits: 2991
> misses: 5436
> no_query: 649
> 
> 2. time find /usr > /dev/null
> real    0m20.557s
> user    0m0.120s
> sys     0m7.080s
> 
> hits: 7064
> misses: 14145
> no_query: 116
> 
> 3. time rm /root/viel
> real    0m5.244s
> user    0m0.000s
> sys     0m2.840s
> 
> hits: 34228
> misses: 2248
> no_query: 202
> 
> With more cache lines we can reduce the boot time a bit, we also observe
> that, compared to a single line, the cache hit rate is better.
> Same for walking /usr.
> Removing a large file is also fast.
> So, having more cache lines helps but we need to figure out first how
> much lines make sense.

That's true, but if we go for an adjustable UBI write-unit cache, I'd
prefer to have an implementation relying the page-cache mechanism so
that the system can adjust the cache size (by reclaiming cached pages)
on-demand instead of hardcoding the number of cache lines at compile
time.

IMO, we should either implement a basic 'single page cache' mechanism
at the UBI level to replace the one at the NAND level and still benefit
from subpage reads for EC and VID headers, or go for the more complex
page-cache based mechanism.

Artem, Brian, any opinion?

BTW, this implementation still assumes that the upper layer will try to
fill NAND pages entirely. This seems to be true for UBIFS, so I'm not
sure we should bother about other cases now, but I'd still like to have
your opinions.

Regards,

Boris