[BUG] New Kernel Bugs

Tue Nov 13 21:10:34 EST 2007

On Tue, 13 Nov 2007 17:11:36 -0800 Stephen Hemminger <shemminger at linux-foundation.org> wrote:

> On Tue, 13 Nov 2007 19:52:17 -0500
> Chuck Ebbert <cebbert at redhat.com> wrote:
> 
> > On 11/13/2007 04:12 PM, Alan Cox wrote:
> > >> Bug fixing is not about finding someone to blame, it's about getting the 
> > >> bug fixed.
> > > 
> > > Partly - its also about understanding why the bug occurred and making it
> > > not happen again.
> > 
> > Very few people think about that part.
> 
> Why does the kernel have very few useful tests?

Tests would of course be nice, but they aren't very useful(!)

Looking at this list which Natalie has generated I see around thirty which
are dependent on the right hardware and ten which are not.  This ratio is
typical, I think.  In fact I'd say that more than 75% of reported bugs are
dependent on hardware.

So the best test of all for the kernel is "run it on a different machine". 
This is why we are sooooo dependent upon our volunteer testers/reporters to
be able to do kernel development.

>  Lack of interest? resources? expertise?
> Ideally each new feature would just be a small add on to an existing test.

Sure.  For system-call-visible features it would be good to do that.

But this tends not to be where bugs get exposed.  Because the original
developer can 100% exercise such code.  That isn't the case with
driver/arch/platform changes.

> Unlike developing new features which seems to grow well with more developers.
> Bug fixing also seems to be a scarcity process. There often seems to be
> a very few people that understand the problem well enough or have the necessary
> hardware to reproduce and fix the problem.

We're 100% dead if "having the hardware" is a prerequisite to fixing a bug.
The terminal state there is that the kernel runs on about 200 machines
worldwide.  We have to work with reporters via email to fix these sorts of
things.  As we of course do.

> Recent changes like tickless and scheduler rework were well thought out and caused
> very little impact to 90% of the users. The problem is the 10% who do have problems.
> Worse, the developers often only hear about the a small sample of those.

Yes.  An unknown number of people just shrug and go back to an old kernel.