[BUG] New Kernel Bugs
mingo at elte.hu
Tue Nov 13 08:40:29 EST 2007
* Andrew Morton <akpm at linux-foundation.org> wrote:
> > > Do you believe that our response to bug reports is adequate?
> > Do you feel that making us feel and look like shit helps?
> That doesn't answer my question.
> See, first we need to work out whether we have a problem. If we do
> this, then we can then have a think about what to do about it.
> I tried to convince the 2006 KS attendees that we have a problem and I
> resoundingly failed. People seemed to think that we're doing OK.
> But it appears that data such as this contradicts that belief.
> This is not a minor matter. If the kernel _is_ slowly deteriorating
> then this won't become readily apparent until it has been happening
> for a number of years. By that stage there will be so much work to do
> to get us back to an acceptable level that it will take a huge effort.
> And it will take a long time after that for the kerel to get its
> reputation back.
> So it is important that we catch deterioration *early* if it is
yes, yes, yes, and i agree with you that there is a problem. I tried to
make this point at the 2007 KS: not only is degradation in quality not
apparent for years, slow degradation in quality can give kernel
developers the exact _opposite_ perception! (Fewer testers means fewer
bugreports and that results in apparent "improved" quality and fewer
reported regressions - while exactly the opposite is happening and
testers are leaving us without giving us any indication that this is
happening. We just dont notice.)
I'm not moaning about bugs that slip through - those are unavoidable
facts of a high flux codebase. I'm moaning about reoccuring, avoidable
bugs, i'm moaning about hostility towards testers, i'm moaning about
hostility towards automated testing, i'm moaning about unnecessary hoops
a willing (but unskilled) tester has to go through to help us out.
I tried to make the point that the only good approach is to remove our
current subjective bias from quality metrics and to at least realize
what a cavalier attitude we still have to QA. The moment we are able to
_measure_ how bad we are, kernel developers will adopt in a second and
will improve those metrics. Lets use more debug tools, both static and
dynamic ones. Lets measure tester base and we need to measure _lost_
early adopters and the reasons why they are lost. Regression metrics are
a very important first step too and i'm very happy about the increasing
effort that is being spent on this.
This is all QA-101 that _cannot be argued against on a rational basis_,
it's just that these sorts of things have been largely ignored for
years, in favor of the all-too-easy "open source means many eyeballs and
that is our QA" answer, which is a _good_ answer but by far not the most
intelligent answer! Today "many eyeballs" is simply not good enough and
nature (and other OS projects) will route us around if we dont change.
We kernel developers have been spoiled by years of abundance in testing
resources. We squander tons of resources in this area, and we could be
so much more economic about this without hindering our development model
in any way. We could be so much better about QA and everyone would
benefit without having to compromize on the incoming flux of changes -
it's so much easier to write new features for a high quality kernel.
My current guesstimation is that we are utilizing our current testing
resources at around 10% efficiency. (i.e. if we did an 'ideal' job we
could fix 10 times as many bugs with the same size of tester effort!) It
used to be around 5%. (and i mainly attribute the increase from 5% to
10% to Andrew and the many other people who do kernel QA - kudos!) 10%
is still awful and we very much suck.
Paradoxically, the "end product" is still considerably good quality in
absolute terms because other pieces of our infrastructure are so good
and powerful, but QA is still a 'weak link' of our path to the user that
reduces the quality of the end result. We could _really_ be so much
better without any compromises that hurt.
(and this is in no way directed at the networking folks - it holds for
all of us. I have one main complaint about networking: the separate
netdev list is a bad idea - networking regressions should be discussed
and fixed on lkml, like most other subsystems are. Any artificial split
of the lk discussion space is bad.)
More information about the linux-pcmcia