[GIT PULL] Identity mapping changes for 3.3
Nicolas Pitre
nico at fluxnic.net
Wed Dec 7 15:38:31 EST 2011
On Wed, 7 Dec 2011, Russell King - ARM Linux wrote:
> On Tue, Dec 06, 2011 at 11:25:47PM -0500, Nicolas Pitre wrote:
> > Make sure the repo on that machine is nicely packed. Running "git gc"
> > (gc as in garbage collect) once in a while is a good thing to do,
> > especially that you now have the smart HTTP protocol enabled. That will
> > bring the memory usage way down, and serving requests will be much
> > faster too. It is safe to put that in a cron job once a week or so,
> > even if concurrent requests are being serviced.
>
> Well, I tried an experiment. On my laptop, if I run git fsck, it takes
> around about 20 minutes to complete.
This is a bit long but reasonable.
> On ZenIV, I started this, this morning:
>
> $ GIT_DIR=linux-2.6-arm.git git fsck
>
> and it's now (this evening) some 10 hours after, its still going. This
> is the exact same repository (as it's an rsync'd copy of the git objects
> and packs which are on the laptop.)
Ouch! No, this is no good.
[...]
> As you can see, git fsck seems to be pulling data at around 50MB/s,
> presumably for 9 hours - this is rediculous because there's only 500MB
> of git data for it to read!
Well, of course the fsck process will keep a tree in memory of the
relationship between all objects and so on. So it certainly has
potential for eating lots of memory and pushing the system into swap.
And I don't think that fsck was optimized to minimize seeks like
pack-objects does.
Of course this is not very representative of a typical git pull process
though. Assuming that people are already updated to v3.2-rc1, you can
simulate the effect on the server by running:
echo "v3.2-rc1..for-next" | \
git pack-objects --progress --thin --revs --stdout > /dev/null
and you really really don't want such an operation to ever touch swap
space.
> What this is saying to me is that git can't run sensibly on a dual-core
> P4, 3GHz machine with 2G of RAM and 4G swap, with a disk IO subsystem
> capable of about 50MB/s - basically, git is driving ZenIV into the ground
> (and I believe git was also responsible for ZenIV having a load average
> hitting a few hundred several months ago which resulted in us having to
> have it rebooted.)
Most likely, yes. The disk throughput shouldn't be such a problem, but
the lack of RAM certainly is. And 2g of RAM to deal with a repository
the size of the Linux kernel is making it tight.
So forget my suggestion about packing the repository on the server since
it certainly doesn't have enough RAM to do a descent job. You should
consider packing it elsewhere on a biffier machine, and copying the
resulting pack to ZenIV. Having a well packed repository is one of the
best way to limit memory consumption when serving a repository, but to
produce that pack you need quite some RAM in the first place.
I'm sure many corporations involved with ARM would be more than welcome
to sponsor some 64-bit hardware for this server and at least 8G of RAM,
which is relatively cheap these days.
> It's worth noting that Linus tree currently has 19 pack files, my tree
> has an additional 9 on that.
One design requirement for the pack file format is that it should be
self sufficient when on disk i.e. no delta representation may refer to a
base object outside of the pack it is found in. Of course the wire
protocol is totally the opposite i.e. the packed data sent during a
transfer is free of any object data the other end already has. So upon
reception over the smart protocol, any packed data has to be "completed"
with a copy of the locally found objects for the pack to be self
contained. There is therefore some data redundancy that accumulates as
the number of packs in a repository grows. Hence the need to
garbage collect once in a while. Having 28 packs is normally not that
huge a deal, but maybe the relatively small amount of RAM makes it more
critical.
Nicolas
More information about the linux-arm-kernel
mailing list