[RESEND] Benchmark of I/O multiplexing with LZO/snappy compression
HATAYAMA Daisuke
d.hatayama at jp.fujitsu.com
Mon Jun 18 02:38:24 EDT 2012
Hello,
I evaluated I/O multiplexing together with paralell compression in two
kinds of formats: lzo and snappy.
In summary:
- By 8-dimentional I/O multiplexing, thoughtput is 5 times as quick as
the 1-dimentional: For snappy, 1TB copy takes 25min.
- For randomized data, snappy is as quick as raw, e.g. no compression
case.
- lzo consumes more CPU time than snappy, but it could probably be
better for quicker CPUs and more sparse data; another kind of bench
is required.
In advance, any comments are appreciated.
Notice: Also, I'm sorry that I cannot attach source files I used in
this benchmark, fakevmcore.c and nsplit.c, due to ``suspicious
header'' warning from mail server; it requires moderator's approval
for the mail to get posted, but no one reacts... So I resend this mail
without any attachment.
* Environments
- PRIMEQUEST 1800E2
- CPU: Intel Xeon E7-8870 (10core/2.4GHz) x 2 sockets
- RAM: 32GB
- DISKS
- MBD2147RC (10025rpm) x 4
- ETERNUS DX440: Emulex 8Gb/s fiber adapters x 4
(*) To get 8-dimentional I/O multiplexing, I used 4 disks and 4 LUNV
of SAN simply because I didn't have enough disks available (^^;
* How to measure what?
This bench measured the real time consumed for copying 10GB
on-memory data, simulating /proc/vmcore, into multiple different
disks with no compression or with LZO and snappy compressions.
The data is randomized enough so the time for compression is
meaningless; no I/O workload changes during compression; this bench
is only for worst case.
/proc/fakevmcore shows
- Parameters
- number of writing/compressing threads (and so number of I/O
multiplexing)
- 1 ~ 8
- compression format
- raw
- lzo
- snappy
- kernel versions
- v3.4
- RHEL6.2 (2.6.32.220)
- RHEL5.8 (2.6.18-238)
example)
- Let fakevmcore of 10GB and its block size 4kB.
- split I/O into two different disks: /mnt/disk{0,1}
- Block size for compression is 4kB.
- compress data in LZO: -c is LZO and -s is snappy.
- flush page cache after nsplit.
$ insmod ./fakevmcore.ko fakevmcore_size=$((10*1024*1024*1024)) fakevmcore_block_size=4096
$ time { nsplit -c --blocksize=4096 /proc/fakevmcore /mnt/disk0/a /mnt/disk1/a ; \
echo 3 > /proc/sys/vm/drop_caches; }
To build nsplit.c on fc16, the following compression libraries are required:
- lzo-devel, lzo-minilzo, lzo
- snappy-devel, snappy
* Results
n: number of writing and compressing threads
- upstream v3.4 kernel
n raw lzo snappy
1 1m29.617s 2m41.979s 1m9.592s
2 1m8.519s 1m26.555s 1m26.902s
3 0m48.653s 1m0.462s 0m35.172s
4 0m28.039s 0m47.248s 0m28.430s
5 0m23.491s 0m37.181s 0m23.435s
6 0m18.202s 0m28.428s 0m18.580s
7 0m15.897s 0m29.873s 0m16.678s
8 0m13.659s 0m23.180s 0m13.922s
- RHEL6.2 (2.6.32.220)
n raw lzo snappy
1 0m53.119s 2m36.603s 1m33.061s
2 1m31.578s 1m28.808s 0m49.492s
3 0m31.675s 0m57.540s 0m33.795s
4 0m37.714s 0m45.035s 0m32.871s
5 0m20.363s 0m34.988s 0m21.894s
6 0m22.602s 0m31.216s 0m19.195s
7 0m18.837s 0m25.204s 0m15.906s
8 0m13.715s 0m22.228s 0m13.884s
- RHEL5.8 (2.6.18-238)
n raw lzo snappy
1 0m55.144s 1m20.771s 1m4.140s
2 0m52.157s 1m8.336s 1m1.089s
3 0m50.172s 0m41.329s 0m47.859s
4 0m35.409s 0m28.764s 0m43.286s
5 0m22.974s 0m20.501s 0m20.197s
6 0m17.430s 0m18.072s 0m19.524s
7 0m14.222s 0m14.936s 0m15.603s
8 0m13.071s 0m14.755s 0m13.313s
- By 8-dimentional I/O multiplexing, throughput is improved as quick
as 4~5 times in raw, 5-6 times in lzo, and 6-8 times in snappy.
- 10GB per 15sec corresponds to 1TB per 25min 36sec.
- snappy is as quick as raw. I think snappy can be used with a very
low risk even at the worst case.
- lzo is slower than raw and snappy. But paralell compression works
well. Although lzo is worse than other two in this bench, I expect
lzo could be better than the other two if using better CPU and data
consists of more sparse data.
- On LZO, RHEL5.8's is better than those of v3.4 and RHEL6.2. Due to
I/O workloads situation? But I don't know that precisely.
* TODO
- Retry benchmark using disks only.
- Evaluate btrfs's transparent compression for large data; for very
large data, compression in kernel-space has advantage compared to
that in user-space.
Thanks.
HATAYAMA, Daisuke
More information about the kexec
mailing list