ZenIV (was: Re: Status of arm-soc.git for 3.2)

Nicolas Pitre nico at fluxnic.net
Wed Jan 4 15:32:13 EST 2012


On Wed, 4 Jan 2012, Russell King - ARM Linux wrote:

> On Wed, Jan 04, 2012 at 07:10:42PM +0000, Russell King - ARM Linux wrote:
> > On Wed, Jan 04, 2012 at 01:55:48PM -0500, Nicolas Pitre wrote:
> > > On Wed, 4 Jan 2012, Russell King - ARM Linux wrote:
> > > > I've now disabled all access to my git tree there, and restarted apache.
> > > > We will see whether that improves stability - I suspect it will do because
> > > > I reckon that the problem is that the smart git stuff is what's killing
> > > > the machine.
> > > 
> > > I'm sure that is the case.  However faulty hardware could still be the 
> > > root cause, but without the load from Git the machine might become 
> > > loaded lightly enough you might not see any ill effects before quite a 
> > > while.
> > 
> > When it's serving the next Fedora release (it's one of the official
> > mirrors) I'm sure any problems like that would become noticable - but
> > they haven't yet.  It purely seems to be something git is doing which
> > is killing the machine.
> 
> I think we've just found the issue causing httpd to die:
> 
> - Fedora systems set a limit at login time on the number of processes a
>   user can run - which is set to a soft limit of 1024.
> 
> - When someone logs in, their shell inherits this soft rlimit.  This
>   gets inherited by all sub-processes, including through a su to their
>   root shell.
> 
> - When they restart httpd, httpd also inherits this limit.
> 
> - httpd's own internal limits are set to a max clients of 1024.
> 
> The problem comes when httpd hits 1024 processes - as it forks as root,
> this succeeds (root is not subjected to this rlimit), and then a
> subsequent setuid() fails with -EAGAIN, causing httpd to experience a
> fatal error.
> 
> Obviously, this is not a good combination of things to happen.  It's
> also completely unnoticable to anyone who restarts apache.

Nasty.

> So, I've 'fixed' it by raising the rlimit in the httpd startup scripts,
> which should keep it fixed whenever anyone else restarts httpd.
> 
> The problem should be solved, and as such I've re-enabled access to
> the git tree.

OK, let's hope things will hold up.

I suspect this might not be correlated to the memory exhaustion ZenIV 
experienced in the past though.  That might require another kind of 
solution if that comes again.


Nicolas



More information about the linux-arm-kernel mailing list