[GIT PULL] Qualcomm driver updates for v6.3

Fri Mar 24 10:26:57 PDT 2023

On Wed, Feb 15, 2023 at 04:05:36PM +0100, Arnd Bergmann wrote:
> On Mon, Jan 30, 2023, at 23:24, Bjorn Andersson wrote:
> > On Mon, Jan 30, 2023 at 04:18:45PM +0100, Arnd Bergmann wrote:
> >> On Thu, Jan 26, 2023, at 17:30, Bjorn Andersson wrote:
> >> 
> >> I don't feel comfortable merging the DCC driver through drivers/soc/
> >> at this point: This is the first time I see the driver and it introduces
> >> a complex user space ABI that I have no time to review as part of the
> >> merge process.
> >> 
> >
> > The DCC driver has made 22 versions over the last 23 months, but now
> > that I look back I do agree that the recipients list has been too
> > limited.
> >
> > Further more, due to the complexity of the ABI I steered this towards
> > debugfs, with the explicit mentioning that we will change the interface
> > if needed - in particular since not a lot of review interest has
> > been shown...
> 
> I'm sorry to hear this has already taken so long, I understand it's
> frustrating to come up with a good userspace interface for any of
> this.
> 
> >> I usually try to avoid adding any custom user space interfaces
> >> in drivers/soc, as these tend to be things that end up being
> >> similar to other chips and need a generic interface.
> >> 
> >
> > I have no concern with that, but I'm not able to suggest an existing
> > subsystem where this would fit.
> >
> >> In particular I don't see an explanation about how the new interface
> >> relates to the established drivers/hwtracing/ subsystem and why it
> >> shouldn't be part of that (adding the hwtracing and coresight
> >> maintainers to Cc in case they have looked at this already).
> >> 
> >
> > To my knowledge the hwtracing framework is an interface for
> > enabling/disabling traces and then you get a stream of trace data out of
> > it.
> >
> > With DCC you essentially write a small "program" to be run at the time
> > of an exception (or triggered manually). When the "program" is run it
> > acquire data from mmio interfaces and stores data in sram, which can
> > then be retrieved - possibly after the fatal reset of the system.
> >
> > Perhaps I've misunderstood the hwtracing framework, please help me steer
> > Souradeep towards a subsystem you find suitable for this functionality.
> 
> I'm also not too familiar with tracing infrastructure and was hoping
> that the coresight maintainers (Mathieu, Suzuki, Mike and Leo)
> would have some suggestions here. My initial guess was that in
> both cases, you have hardware support that is abstracted by the
> kernel in order to have a user interface that can be consumed
> by the 'perf' tool. I probably misinterpreted the part about the
> crash based trigger here, as my original (brief) reading was that
> the data snapshot could be triggered by any kind of event in
> the machine, which would make this useful as a more general
> way of tracing the state of devices at runtime. Can you describe
> how the crash trigger works, and if this would be usable with
> other random hardware events aside from an explicit software
> event?
> 
> I've added the perf maintainers to Cc as well now, for reference,
> the now reverted commit is at
> https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux.git/commit/?h=drivers-for-6.3&id=4cbe60cf5ad62
> and it contains both the implementation and the documentation
> of the debugfs interface.
> 
> One bit I don't see is the user space side. Is there a patch for
> perf as well, or is the idea to use a custom tool for this? How
> does userspace know which MMIO addresses are even valid here?
> 
> If the possible use is purely for saving some state across
> a reboot, as opposed to other events, I wonder if there is
> a good way to integrate it into the fs/pstore/ code, which
> already has a way to multiplex various kinds of input (log
> buffer, ftrace call chain, userspace strings, ...) into
> various kinds of persistent buffers (sram, blockdev, mtd,
> efivars, ...) with the purpose of helping analyze the
> state after a reboot. 
> 

Iiuc pstore provides a place to store system state for analysis after a
reboot, but DCC provides essentially register dumps on demand - with the
system reset being a special case trigger.

So I think it would look neat to expose the DCC data alongside other
pstore data (except for the mentioned issues with ramoops not working on
most Qualcomm devices), but when the reboot happens DCC captures data in
the DCC SRAM, not in the pstore (whatever backing might be used). So
post boot, something would need to get the DCC data into the pstore.

To me this sounds in conflict with the pstore design.

Further more, with the reboot trigger being the special case, we'd need
to amend the pstore state in runtime to capture the case where the user
requested the DCC to capture the state.

One idea that I was looking at was to trigger a devcoredump as a way to
get the data out of the kernel, instead of a new device node. But it
doesn't seem to fit very well with existing use cases, and I haven't
used DCC sufficiently - given that it doesn't exist upstream...

We made significant changes to the control interface through the review
process, I think we have something that looks reasonable now, but I
picked the patches under the premise that it's unstable and in debugfs,
and exposing the tool to users could lead to more interest in
polishing it.

> >> Can you send an updated pull request that leaves out the
> >> DCC driver until we have clarified these points?
> >> 
> >
> > I will send a new pull request, with the driver addition reverted. I
> > don't think there's anything controversial with the DT binding, so let's
> > keep that and the dts nodes (we can move the yaml if a better home is
> > found...)
> 
> Right, this is fine. I merged the first pull request after I saw the
> revert in the second drivers branch, though I did not see a pull request
> from you that replaced the first one with just the revert added as
> I had expected. Also, I see that patchwork never noticed me merging
> the PR, so you did not get the automated email. Maybe you can double
> check the contents of the soc/drivers branch to see if the contents
> are what you expect them to be.
> 

I've promised the ChromeOS team to try really hard to keep the commits
in my branch stable, so I really try to avoid rebasing commits that has
been present in linux-next for a while.

Regards,
Bjorn