[PATCH] panic.c: export panic_on_oops

Mon Oct 12 09:15:29 EDT 2009

* Simon Kagstrom <simon.kagstrom at netinsight.net> wrote:

> On Mon, 12 Oct 2009 14:20:23 +0200
> Ingo Molnar <mingo at elte.hu> wrote:
> 
> > * David Woodhouse <dwmw2 at infradead.org> wrote:
> > 
> > > On Mon, 2009-10-12 at 14:09 +0200, Ingo Molnar wrote:
> > > > Also, would it be possible to just simplify the thing and not do any 
> > > > buffering at all? Extra buffering complexity in a console driver is only 
> > > > asking for trouble. Or is flash storage write cycles optimization that 
> > > > important in this case?
> > > 
> > > That and the fact that on NAND flash you have to write full pages at a 
> > > time -- that's 512 bytes, 2KiB or 4KiB depending on the type of chip. 
> > > So we really do want to buffer it where we can.
> > > 
> > > We don't want to write a 2KiB page for every line of printk output.
> > 
> > Then i think the buffering is at the wrong place: we should instead 
> > buffer in the generic layer and pass it to lowlevel if we know that we 
> > have gone past a 2K boundary.
> >
> > The size of the generic log buffer is always a power of two so 
> > detecting 2K boundaries is very easy. On any emergency the generic 
> > console layer will do faster flushes - this is nothing the console 
> > driver itself should bother with.
> 
> But this is only part of the mtdoops problem (the reason why we don't 
> write all the time). The current code only stores messages printed 
> during an oops, and this behavior will surely change if the console 
> driver gets large buffers of output - or it would have to take in the 
> output unbuffered anyway.
> 
> My patch changes this behavior, and with that I don't think buffered 
> output would be a problem - it would indeed make it more simple as you 
> say - assuming there is something like ->kernel_bug() that would flush 
> the last 4KiB or so of messages to mtdoops when there is an oops or 
> panic.
> 
> > And that would avoid the whole workqueue logic - which is fragile to 
> > be done in a printk to begin with.
> 
> I'm afraid I don't really see this issue. The workqueue is used to 
> write the buffer to the mtd device if we are not in a panic or 
> interrupt context - in which case we do it directly.
> 
> So it's only used when an oops is ongoing.

This fixation on 'panic' is so wrong!

90% of the bugs users care about dont involve any panic. And even if 
there is a panic down the line, most of the interesting messages are in 
the stream leading up to the panic - now tucked away in that async 
workqueue mechanism and not visible.

There's two clean solutions i think:

1) add some new "ok, there's trouble!" callback to struct console and 
   the console driver could via that mechanism send out the _last_ 2KB 
   (or more) of kernel log messages. Basically we can go back in time by 
   looking at the dmesg buffer. The low level console driver does not 
   need to 'follow' the high level console state - it only wants to 
   print in case of trouble anyway.

2) or add buffered (flash-friendly) writes for all printk output - panic
   and non-panic alike. This would be useful to debug suspend/resume
   bugs for example. This would also optimize the packets of netconsole
   output. (last i checked we sent a packet per line.)

The workqueue looks wrong in both variants. If we are panic-ing (or 
hanging, or ...) then we are halting the machine - the workqueue has no 
chance to actually execute.

	Ingo