[EXT] Re: [PATCH 5/7] coresight: tmc: Add support for reading tracedata from previous boot

Tue Oct 10 06:23:07 PDT 2023

Hi James,

> -----Original Message-----
> From: James Clark <james.clark at arm.com>
> Sent: Wednesday, October 4, 2023 7:18 PM
> To: Linu Cherian <lcherian at marvell.com>; suzuki.poulose at arm.com;
> mike.leach at linaro.org; leo.yan at linaro.org
> Cc: linux-arm-kernel at lists.infradead.org; coresight at lists.linaro.org; linux-
> kernel at vger.kernel.org; robh+dt at kernel.org;
> krzysztof.kozlowski+dt at linaro.org; conor+dt at kernel.org;
> devicetree at vger.kernel.org; Sunil Kovvuri Goutham
> <sgoutham at marvell.com>; George Cherian <gcherian at marvell.com>; Anil
> Kumar Reddy H <areddy3 at marvell.com>; Tanmay Jagdale
> <tanmay at marvell.com>
> Subject: [EXT] Re: [PATCH 5/7] coresight: tmc: Add support for reading
> tracedata from previous boot
> 
> External Email
> 
> ----------------------------------------------------------------------
> 
> 
> On 03/10/2023 17:43, James Clark wrote:
> >
> >
> > On 29/09/2023 14:37, Linu Cherian wrote:
> >> * Introduce a new mode CS_MODE_READ_PREVBOOT for reading
> tracedata
> >>   captured in previous boot.
> >>
> >> * Add special handlers for preparing ETR/ETF for this special mode
> >>
> >> * User can read the trace data as below
> >>
> >>   For example, for reading trace data from tmc_etf sink
> >>
> >>   1. cd /sys/bus/coresight/devices/tmc_etfXX/
> >>
> >>   2. Change mode to READ_PREVBOOT
> >>
> >>      #echo 1 > read_prevboot
> >>
> >>   3. Dump trace buffer data to a file,
> >>
> >>      #dd if=/dev/tmc_etrXX of=~/cstrace.bin
> >>
> >>   4. Reset back to normal mode
> >>
> >>      #echo 0 > read_prevboot
> >>
> >> Signed-off-by: Anil Kumar Reddy <areddy3 at marvell.com>
> >> Signed-off-by: Tanmay Jagdale <tanmay at marvell.com>
> >> Signed-off-by: Linu Cherian <lcherian at marvell.com>
> >> ---
> >>  .../coresight/coresight-etm4x-core.c          |   1 +
> >>  .../hwtracing/coresight/coresight-tmc-core.c  |  81 +++++++++-
> >>  .../hwtracing/coresight/coresight-tmc-etf.c   |  62 ++++++++
> >>  .../hwtracing/coresight/coresight-tmc-etr.c   | 145 +++++++++++++++++-
> >>  drivers/hwtracing/coresight/coresight-tmc.h   |   6 +
> >>  include/linux/coresight.h                     |  13 ++
> >>  6 files changed, 306 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> index 77b0271ce6eb..513baf681280 100644
> >> --- a/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c
> >> @@ -1010,6 +1010,7 @@ static void etm4_disable(struct
> >> coresight_device *csdev,
> >>
> >>  	switch (mode) {
> >>  	case CS_MODE_DISABLED:
> >> +	case CS_MODE_READ_PREVBOOT:
> >>  		break;
> >>  	case CS_MODE_SYSFS:
> >>  		etm4_disable_sysfs(csdev);
> >> diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c
> >> b/drivers/hwtracing/coresight/coresight-tmc-core.c
> >> index 6658ce76777b..65c15c9f821b 100644
> >> --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> >> +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
> >> @@ -103,6 +103,45 @@ u32 tmc_get_memwidth_mask(struct
> tmc_drvdata *drvdata)
> >>  	return mask;
> >>  }
> >>
> >> +int tmc_read_prepare_prevboot(struct tmc_drvdata *drvdata) {
> >> +	int ret = 0;
> >> +	struct tmc_register_snapshot *reg_ptr;
> >> +	struct coresight_device *csdev = drvdata->csdev;
> >> +
> >> +	if (!drvdata->metadata.vaddr) {
> >> +		ret = -ENOMEM;
> >> +		goto out;
> >> +	}
> >> +
> >> +	reg_ptr = drvdata->metadata.vaddr;
> >> +	if (!reg_ptr->valid) {
> >> +		dev_err(&drvdata->csdev->dev,
> >> +			"Invalid metadata captured from previous boot\n");
> >> +		ret = -EINVAL;
> >> +		goto out;
> >> +	}
> >
> > I'm wondering if a more robust check is needed than the valid flag,
> > like a checksum or something. I didn't debug it yet but I ended up
> > with an invalid set of metadata after a panic reboot, see below. I'm
> > not sure if it's just a logic bug or something got lost during the
> > reboot, I didn't debug it yet. But I suppose unless you assume the
> > panic didn't affect writing the metadata, then it could be partially
> > written and shouldn't be trusted?
> >
> > [...]
> >> +
> >> +static int tmc_etr_sync_prevboot_buf(struct tmc_drvdata *drvdata) {
> >> +	u32 status;
> >> +	u64 rrp, rwp, dba;
> >> +	struct tmc_register_snapshot *reg_ptr;
> >> +	struct etr_buf *etr_buf = drvdata->prevboot_buf;
> >> +
> >> +	reg_ptr = drvdata->metadata.vaddr;
> >> +
> >> +	rrp = reg_ptr->rrp;
> >> +	rwp = reg_ptr->rwp;
> >> +	dba = reg_ptr->dba;
> >> +	status = reg_ptr->sts;
> >> +
> >> +	etr_buf->full = !!(status & TMC_STS_FULL);
> >> +
> >> +	/* Sync the buffer pointers */
> >> +	etr_buf->offset = rrp - dba;
> >> +	if (etr_buf->full)
> >> +		etr_buf->len = etr_buf->size;
> >> +	else
> >> +		etr_buf->len = rwp - rrp;
> >> +
> >> +	/* Sanity checks for validating metadata */
> >> +	if ((etr_buf->offset > etr_buf->size) ||
> >> +	    (etr_buf->len > etr_buf->size))
> >> +		return -EINVAL;
> >
> > The values I got here are 0x781b67182aa346f9 0x8000000 0x8000000 for
> > offset, size and len respectively. This fails the first check. It
> > would also be nice to have a dev_dbg here as well, it's basically the
> > same as the valid check above which does have one.
> >
> 
> So I debugged it and the issue is that after the panic I was doing a cold boot
> rather than a warm boot and the memory was being randomised.
> 
> The reason that 0x8000000 seemed to be initialised is because they are based
> on the reserved region size, rather than anything from the metadata. When I
> examined the metadata it was all randomised.
> 
> That leads me to think that the single bit for 'valid' is insufficient.
> There is a simple hashing function in include/linux/stringhash.h that we could
> use on the whole metadata struct, but that specifically says:
> 
>  * These hash functions are NOT GUARANTEED STABLE between kernel
>  * versions, architectures, or even repeated boots of the same kernel.
>  * (E.g. they may depend on boot-time hardware detection or be
>  * deliberately randomized.)
> 
> Although I'm not sure how true the repeated boots of the same kernel part
> is.
> 
> Maybe something in include/crypto/hash.h could be used instead, or make
> our own simple hash.

Thanks for the pointers. Will take a look at it.