[PATCH RFC 1/2] coding-style.rst: document BUG() and WARN() rules ("do not crash the kernel")

Dave Young dyoung at redhat.com
Thu Aug 25 18:43:54 PDT 2022


Hi David,

[Added more people in cc]

On Thu, 25 Aug 2022 at 20:13, David Hildenbrand <david at redhat.com> wrote:
>
> On 24.08.22 23:59, John Hubbard wrote:
> > On 8/24/22 09:30, David Hildenbrand wrote:
> >> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> >> index 03eb53fd029a..a6d81ff578fe 100644
> >> --- a/Documentation/process/coding-style.rst
> >> +++ b/Documentation/process/coding-style.rst
> >> @@ -1186,6 +1186,33 @@ expression used.  For instance:
> >>      #endif /* CONFIG_SOMETHING */
> >>
> >
> > I like the idea of adding this documentation, and this is the right
> > place. Naturally, if one likes something, one must immediately change
> > it. :) Therefore, here is an alternative writeup that I think captures
> > what you and the email threads were saying.
> >
> > How's this sound?
>
> Much better, thanks! :)
>
> >
> > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> > index 03eb53fd029a..32df0d503388 100644
> > --- a/Documentation/process/coding-style.rst
> > +++ b/Documentation/process/coding-style.rst
> > @@ -1185,6 +1185,53 @@ expression used.  For instance:
> >         ...
> >         #endif /* CONFIG_SOMETHING */
> >
> > +22) Do not crash the kernel
> > +---------------------------
> > +
> > +Use WARN() rather than BUG()
> > +****************************
> > +
> > +Do not add new code that uses any of the BUG() variants, such as BUG(),
> > +BUG_ON(), or VM_BUG_ON(). Instead, use a WARN*() variant, preferably
> > +WARN_ON_ONCE(), and possibly with recovery code. Recovery code is not required
> > +if there is no reasonable way to at least partially recover.
>
> I'll tend to keep in this section:
>
> "Unavoidable data corruption / security issues might be a very rare
> exception to this rule and need good justification."
>
> Because there are rare exceptions, and I'd much rather document the
> clear exception to this rule.
>
> > +
> > +Use WARN_ON_ONCE() rather than WARN() or WARN_ON()
> > +**************************************************
> > +
> > +WARN_ON_ONCE() is generally preferred over WARN() or WARN_ON(), because it is
> > +common for a given warning condition, if it occurs at all, to occur multiple
> > +times. (For example, once per file, or once per struct page.) This can fill up
>
> I'll drop the "For example" part. I feel like this doesn't really need
> an example -- most probably we've all been there already when the kernel
> log was flooded :)
>
> > +and wrap the kernel log, and can even slow the system enough that the excessive
> > +logging turns into its own, additional problem.
> > +
> > +Do not WARN lightly
> > +*******************
> > +
> > +WARN*() is intended for unexpected, this-should-never-happen situations. WARN*()
> > +macros are not to be used for anything that is expected to happen during normal
> > +operation. These are not pre- or post-condition asserts, for example. Again:
> > +WARN*() must not be used for a condition that is expected to trigger easily, for
> > +example, by user space actions. pr_warn_once() is a possible alternative, if you
> > +need to notify the user of a problem.
> > +
> > +Do not worry about panic_on_warn users
> > +**************************************
> > +
> > +A few more words about panic_on_warn: Remember that ``panic_on_warn`` is an
> > +available kernel option, and that many users set this option. This is why there
> > +is a "Do not WARN lightly" writeup, above. However, the existence of
> > +panic_on_warn users is not a valid reason to avoid the judicious use WARN*().
> > +That is because, whoever enables panic_on_warn has explicitly asked the kernel
> > +to crash if a WARN*() fires, and such users must be prepared to deal with the
> > +consequences of a system that is somewhat more likely to crash.
>
> Side note: especially with kdump() I feel like we might see much more
> widespread use of panic_on_warn to be able to actually extract debug
> information in a controlled manner -- for example on enterprise distros.
> ... which would then make these systems more likely to crash, because
> there is no way to distinguish a rather harmless warning from a severe
> warning :/ . But let's see if some kdump() folks will share their
> opinion as reply to the cover letter.

I can understand the intention of this patch, and I totally agree that
BUG() should be used carefully, this is a good proposal if we can
clearly define the standard about when to use BUG().  But I do have
some worries,  I think this standard is different for different sub
components, it is not clear to me at least,  so this may introduce an
unstable running kernel and cause troubles (eg. data corruption) with
a WARN instead of a BUG. Probably it would be better to say "Do not
WARN lightly, and do not hesitate to use BUG if it is really needed"?

About "patch_on_warn", it will depend on the admin/end user to set it,
it is not a good idea for distribution to set it. It seems we are
leaving it to end users to take the risk of a kernel panic even with
all kernel WARN even if it is sometimes not necessary.


>
> --
> Thanks,
>
> David / dhildenb
>

Thanks
Dave




More information about the kexec mailing list