[PATCH v10 08/10] dyndbg: add print-to-tracefs, selftest with it - RFC

Mon Nov 22 14:42:38 PST 2021

On Mon, Nov 22, 2021 at 2:02 AM Pekka Paalanen <ppaalanen at gmail.com> wrote:
>
> On Fri, 19 Nov 2021 11:21:36 -0500
> Jason Baron <jbaron at akamai.com> wrote:
>
> > On 11/18/21 10:24 AM, Pekka Paalanen wrote:
> > > On Thu, 18 Nov 2021 09:29:27 -0500
> > > Jason Baron <jbaron at akamai.com> wrote:
> > >
> > >> On 11/16/21 3:46 AM, Pekka Paalanen wrote:
> > >>> On Fri, 12 Nov 2021 10:08:41 -0500
> > >>> Jason Baron <jbaron at akamai.com> wrote:
> > >>>
> > >>>> On 11/12/21 6:49 AM, Vincent Whitchurch wrote:
> > >>>>> On Thu, Nov 11, 2021 at 03:02:04PM -0700, Jim Cromie wrote:
> > >>>>>> Sean Paul proposed, in:
> > >>>>>> https://urldefense.com/v3/__https://patchwork.freedesktop.org/series/78133/__;!!GjvTz_vk!HcKnMRByYkIdyF1apqQjlN5aBIomzJR1an3YWXM6KXs0EftVMQdrewRA8Dki4A$
> > >>>>>> drm/trace: Mirror DRM debug logs to tracefs
> > >>>>>>
> > >>>>>> His patchset's objective is to be able to independently steer some of
> > >>>>>> the drm.debug stream to an alternate tracing destination, by splitting
> > >>>>>> drm_debug_enabled() into syslog & trace flavors, and enabling them
> > >>>>>> separately.  2 advantages were identified:
> > >>>>>>
> > >>>>>> 1- syslog is heavyweight, tracefs is much lighter
> > >>>>>> 2- separate selection of enabled categories means less traffic
> > >>>>>>
> > >>>>>> Dynamic-Debug can do 2nd exceedingly well:
> > >>>>>>
> > >>>>>> A- all work is behind jump-label's NOOP, zero off cost.
> > >>>>>> B- exact site selectivity, precisely the useful traffic.
> > >>>>>>    can tailor enabled set interactively, at shell.
> > >>>>>>
> > >>>>>> Since the tracefs interface is effective for drm (the threads suggest
> > >>>>>> so), adding that interface to dynamic-debug has real potential for
> > >>>>>> everyone including drm.
> > >>>>>>
> > >>>>>> if CONFIG_TRACING:
> > >>>>>>
> > >>>>>> Grab Sean's trace_init/cleanup code, use it to provide tracefs
> > >>>>>> available by default to all pr_debugs.  This will likely need some
> > >>>>>> further per-module treatment; perhaps something reflecting hierarchy
> > >>>>>> of module,file,function,line, maybe with a tuned flattening.
> > >>>>>>
> > >>>>>> endif CONFIG_TRACING
> > >>>>>>
> > >>>>>> Add a new +T flag to enable tracing, independent of +p, and add and
> > >>>>>> use 3 macros: dyndbg_site_is_enabled/logging/tracing(), to encapsulate
> > >>>>>> the flag checks.  Existing code treats T like other flags.
> > >>>>>
> > >>>>> I posted a patchset a while ago to do something very similar, but that
> > >>>>> got stalled for some reason and I unfortunately didn't follow it up:
> > >>>>>
> > >>>>>  https://urldefense.com/v3/__https://lore.kernel.org/lkml/20200825153338.17061-1-vincent.whitchurch@axis.com/__;!!GjvTz_vk!HcKnMRByYkIdyF1apqQjlN5aBIomzJR1an3YWXM6KXs0EftVMQdrewRGytKHPg$
> > >>>>>
> > >>>>> A key difference between that patchset and this patch (besides that
> > >>>>> small fact that I used +x instead of +T) was that my patchset allowed
> > >>>>> the dyndbg trace to be emitted to the main buffer and did not force them
> > >>>>> to be in an instance-specific buffer.
> > >>>>
> > >>>> Yes, I agree I'd prefer that we print here to the 'main' buffer - it
> > >>>> seems to keep things simpler and easier to combine the output from
> > >>>> different sources as you mentioned.
> > >>>
> > >>> Hi,
> > >>>
> > >>> I'm not quite sure I understand this discussion, but I would like to
> > >>> remind you all of what Sean's original work is about:
> > >>>
> > >>> Userspace configures DRM tracing into a flight recorder buffer (I guess
> > >>> this is what you refer to "instance-specific buffer").
> > >>>
> > >>> Userspace runs happily for months, and then hits a problem: a failure
> > >>> in the DRM sub-system most likely, e.g. an ioctl that should never
> > >>> fail, failed. Userspace handles that failure by dumping the flight
> > >>> recorder buffer into a file and saving or sending a bug report. The
> > >>> flight recorder contents give a log of all relevant DRM in-kernel
> > >>> actions leading to the unexpected failure to help developers debug it.
> > >>>
> > >>> I don't mind if one can additionally send the flight recorder stream to
> > >>> the main buffer, but I do want the separate flight recorder buffer to
> > >>> be an option so that a) unrelated things cannot flood the interesting
> > >>> bits out of it, and b) the scope of collected information is relevant.
> > >>>
> > >>> The very reason for this work is problems that are very difficult to
> > >>> reproduce in practice, either because the problem itself is triggered
> > >>> very rarely and randomly, or because the end users of the system have
> > >>> either no knowledge or no access to reconfigure debug logging and then
> > >>> reproduce the problem with good debug logs.
> > >>>
> > >>> Thank you very much for pushing this work forward!
> > >>>
> > >>>
> > >>
> > >> So I think Vincent (earlier in the thread) was saying that he finds it
> > >> very helpful have dynamic debug output go to the 'main' trace buffer,
> > >> while you seem to be saying you'd prefer it just go to dynamic debug
> > >> specific trace buffer.
> > >
> > > Seems like we have different use cases: traditional debugging, and
> > > in-production flight recorder for problem reporting. I'm not surprised
> > > if they need different treatment.
> > >
> > >> So we certainly can have dynamic output potentially go to both places -
> > >> although I think this would mean two tracepoints? But I really wonder
> > >> if we really need a separate tracing buffer for dynamic debug when
> > >> what goes to the 'main' buffer can be controlled and filtered to avoid
> > >> your concern around a 'flood'?
> > >
> > > If the DRM tracing goes into the main buffer, then systems in
> > > production cannot have any other sub-system traced in a similar
> > > fashion. To me it would feel very arrogant to say that to make use of
> > > DRM flight recording, you cannot trace much or anything else.
> > >
> > > The very purpose of the flight recorder is run in production all the
> > > time, not in a special debugging session.
> > >
> > > There is also the question of access and contents of the trace buffer.
> > > Ultimately, if automatic bug reports are enabled in a system, the
> > > contents of the trace buffer would be sent as-is to some bug tracking
> > > system. If there is a chance to put non-DRM stuff in the trace buffer,
> > > that could be a security problem.
> > >
> > > My use case is Weston. When Weston encounters an unexpected problem in
> > > production, something should automatically capture the DRM flight
> > > recorder contents and save it alongside the Weston log. Would be really
> > > nice if Weston itself could do that, but I suspect it is going to need
> > > root privileges so it needs some helper daemon.
> > >
> > > Maybe Sean can reiterate their use case more?
> > >
> > >
> > > Thanks,
> > > pq
> > >
> >
> > Ok, so in this current thread the proposal was to create a "dyndbg-tracefs"
> > buffer to put the dynamic debug output (including drm output from dynamic
> > debug) into. And I was saying let's just put in the 'main' trace buffer
> > (predicated on a dynamic debug specific tracepoint), since there seems
> > to be a a use-case for that and it keeps things simpler.
> >
> > But I went back to Sean's original patch, and it creates a drm specific
> > trace buffer "drm" (via trace_array_get_by_name("drm")). Here:
> > https://patchwork.freedesktop.org/patch/445549/?series=78133&rev=5
> >
> > So I think that may be some of the confusion here? The current thread/
> > proposal is not for a drm specific trace buffer...
>
> Hi Jason,
>
> I may very well have confused things, sorry about that. If this series
> is not superseding the idea of the DRM flight recorder, then don't mind
> me. It just sounded very similar and I also haven't seen new revisions
> of the flight recorder in a long time.

IMO this series has clarified the requirement for a flight-recorder mode,
which seems to fit ideally in a separate instance.

> > Having a subsystem specific trace buffer would allow subsystem specific
> > trace log permissions depending on the sensitivity of the data. But
> > doesn't drm output today go to the system log which is typically world
> > readable today?
>
> Yes, and that is exactly the problem. The DRM debug output is so high
> traffic it would make the system log both unusable due to cruft and
> slow down the whole machine. The debug output is only useful when
> something went wrong, and at that point it is too late to enable
> debugging. That's why a flight recorder with an over-written circular
> in-memory buffer is needed.

Seans patch reuses enum drm_debug_category to split the tracing
stream into 10 sub-streams
- how much traffic from each ?
- are some sub-streams more valuable for post-mortem ?
- any value from further refinement of categories ?
- drop irrelevant callsites individually to reduce clutter, extend
buffer time/space ?