[PATCH V17 2/7] soc: qcom: dcc: Add driver support for Data Capture and Compare unit(DCC)

Tue Nov 8 21:40:49 PST 2022

On 11/7/2022 11:12 AM, Souradeep Chowdhury wrote:
> 
> 
> On 11/1/2022 12:21 AM, Alex Elder wrote:
>> On 10/21/22 2:14 AM, Souradeep Chowdhury wrote:
>>>
>>>
>>> On 10/21/2022 5:37 AM, Alex Elder wrote:
>>>> On 10/14/22 1:00 AM, Souradeep Chowdhury wrote:
>>>>> The DCC is a DMA Engine designed to capture and store data
>>>>> during system crash or software triggers. The DCC operates
>>>>> based on user inputs via the debugfs interface. The user gives
>>>>> addresses as inputs and these addresses are stored in the
>>>>> dcc sram. In case of a system crash or a manual software
>>>>> trigger by the user through the debugfs interface,
>>>>> the dcc captures and stores the values at these addresses.
>>>>> This patch contains the driver which has all the methods
>>>>> pertaining to the debugfs interface, auxiliary functions to
>>>>> support all the four fundamental operations of dcc namely
>>>>> read, write, read/modify/write and loop. The probe method
>>>>> here instantiates all the resources necessary for dcc to
>>>>> operate mainly the dedicated dcc sram where it stores the
>>>>> values. The DCC driver can be used for debugging purposes
>>>>> without going for a reboot since it can perform software
>>>>> triggers as well based on user inputs.
>>>>>
>>>>> Also added the documentation for debugfs entries and explained
>>>>> the functionalities of each debugfs file that has been created
>>>>> for dcc.
>>>>>
>>>>> The following is the justification of using debugfs interface
>>>>> over the other alternatives like sysfs/ioctls
>>>>>
>>>>> i) As can be seen from the debugfs attribute descriptions,
>>>>> some of the debugfs attribute files here contains multiple
>>>>> arguments which needs to be accepted from the user. This goes
>>>>> against the design style of sysfs.
>>>>>
>>>>> ii) The user input patterns have been made simple and convenient
>>>>> in this case with the use of debugfs interface as user doesn't
>>>>> need to shuffle between different files to execute one instruction
>>>>> as was the case on using other alternatives.
>>>>>
>>>>> Signed-off-by: Souradeep Chowdhury <quic_schowdhu at quicinc.com>
>>>>
>>>> I haven't followed any review feedback you have received
>>>> since verion 8 (which I reviewed), so if I say something
>>>> that conflicts with other feedback I apologize.  I know
>>>> Bjorn had some comments too, so you're already going to
>>>> send another version.
>>>>
>>>> Unfortunately I have some more input, including some things
>>>> that are basically bugs (because buffers could be overrun).
>>>> I will plan to review again once you've had a chance to
>>>> address my comments.
>>>>
>>>>                      -Alex
>>>
>>> Thanks for the review. Will be sending out the next version 
>>> implementing Bjorn's and your comments.
>>
>> Sorry for my delayed response.  Your message didn't show up
>> in my "normal" mail box so I'm catching up now.
>>
>> . . .
>>
>>>>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
>>>>> index d66604a..b1fe812 100644
>>>>> --- a/drivers/soc/qcom/Makefile
>>>>> +++ b/drivers/soc/qcom/Makefile
>>>>> @@ -4,6 +4,7 @@ obj-$(CONFIG_QCOM_AOSS_QMP) +=    qcom_aoss.o
>>>>>   obj-$(CONFIG_QCOM_GENI_SE) +=    qcom-geni-se.o
>>>>>   obj-$(CONFIG_QCOM_COMMAND_DB) += cmd-db.o
>>>>>   obj-$(CONFIG_QCOM_CPR)        += cpr.o
>>>>> +obj-$(CONFIG_QCOM_DCC) += dcc.o
>>>>>   obj-$(CONFIG_QCOM_GSBI)    +=    qcom_gsbi.o
>>>>>   obj-$(CONFIG_QCOM_MDT_LOADER)    += mdt_loader.o
>>>>>   obj-$(CONFIG_QCOM_OCMEM)    += ocmem.o
>>>>> diff --git a/drivers/soc/qcom/dcc.c b/drivers/soc/qcom/dcc.c
>>>>> new file mode 100644
>>>>> index 0000000..efad225
>>>>> --- /dev/null
>>>>> +++ b/drivers/soc/qcom/dcc.c
>>
>> . . .
>>
>>>> Then you use DCC_ADDR_RANGE_MASK to truncate an address
>>>> provided down to a multiple of 16 bytes.  Why is that?
>>>> Is there a hardware limitation that makes 16 byte alignment
>>>> necessary?  (A little more below, where they're used.)
>>>
>>> Yes,this is necessary as per dcc_sram hardware configuraton.
>>
>> OK.  I assumed that, but it's worth mentioning that
>> somewhere (perhaps you already did, and I just missed it).
> 
> Ack
> 
>>
>> . . .
>>
>>>> I have some questions about the way memory regions
>>>> are defined here.
>>>>
>>>> - You round down the address using DCC_ADDR_RANGE_MASK.
>>>>    Is that because the address has an alignment requirement?
>>>> - DCC_ADDR_RANGE_MASK is 0xfffffff0, meaning it's 16-byte
>>>>    aligned.  Is that the required alignment?  (It is more
>>>>    strict than the 32-bit word size.)
>>>> - Is there any requirement on the size (in bytes)?  I.e.,
>>>>    does it need to be 16-byte aligned?  (You multiply the
>>>>    count by 4, which I presume is sizeof(u32), the word size.)
>>>> - If the base address is affected by rounding down like
>>>>    this, you aren't updating the length, which it seems
>>>>    could omit a word at the end of the desired range.
>>>> - You are checking to be sure the word count doesn't exceed
>>>>    the RAM size.  But you're using DCC_ADDR_OFF_RANGE=8,
>>>>    even though you said that a "word" is 32 bits.
>>>
>>> The check for the DCC_ADDR_OFF_RANGE=8 is to give an arbitrary
>>> restriction in word length for the dcc configuration but ideally it
>>> should be 4 as dcc sram word length is 4, will be changing this 
>>> accordingly.
>>
>> I think that will be clearer.  Using the word length avoids
>> any need to explain why 8 was being used.
> 
> Ack
> 
>>
>>> Also the base address alignment requirement is consistent as per the
>>> DCC hardware specification. The address range has to be 16 byte
>>> aligned.
>>
>> So you're saying the size in bytes also has this requirement?
>> If so, then it's good you'll enforce it.
> 
> Ack
> 
>>
>>>>
>>>>> +    if (!len || len > drvdata->ram_size / DCC_ADDR_OFF_RANGE) {
>>>>> +        dev_err(drvdata->dev, "DCC: Invalid length\n");
>>>>> +        ret = -EINVAL;
>>>>> +        goto out_unlock;
>>>>> +    }
>>>>> +
>>>>> +    base = addr & DCC_ADDR_RANGE_MASK;
>>>> Maybe:
>>>>      base = round_down(addr, DCC_WORD_SIZE);
>>>>
>>>> Then you don't even need DCC_ADDR_RANGE_MASK.
>>>>
>>>> And then:
>>>>      len += base - addr;
>>>> And if necessary:
>>>>      len = round_up(addr, DCC_WORD_SIZE);
>>>> And finally:
>>>>      if (len > drvdata->ram_size / DCC_WORD_SIZE)
>>>>          return -EINVAL;
>>>
>>> Ack
>>
>> . . .
>>
>>>>> +    if (ret)
>>>>> +        return -EFAULT;
>>>>> +    if (count > sizeof(buf) || count == 0)
>>>>> +        return -EINVAL;
>>>>> +
>>>>> +    curr_list = dcc_filp_curr_list(filp);
>>>>> +    if (curr_list < 0)
>>>>> +        return curr_list;
>>>>> +
>>>>> +    if (buf[count - 1] == '\n')
>>>>> +        buf[count - 1] = '\0';
>>>>> +    else
>>>>> +        return -EINVAL;
>>>> Why is it important for the input buffer to end in newline?
>>>
>>> We are using the newline to convert the input buffer into a string
>>>
>>> for strsep operations.
>>
>> But strsep() returns the entire string if it finds the '\0'
>> before finding any of the delimiters.  So the effect should
>> be the same.  It's possible I'm misunderstanding but I think
>> there's no need for this check at all.
> 
> Ack
> 
>>
>>>>> +    /* EOF check */
>>>>> +    if (*ppos >= drvdata->ram_size)
>>>>> +        return 0;
>>>>> +
>>>>> +    if ((*ppos + len) > drvdata->ram_size)
>>>>> +        len = (drvdata->ram_size - *ppos);
>>>>> +
>>>>> +    buf = kzalloc(len, GFP_KERNEL);
>>>>
>>>> Now that you are using memremap() rather than ioremap()
>>>> for the ram_base memory, I don't think you have any need
>>>> to allocate a buffer here anymore.
>>>
>>> Ack. As per Bjorn's comments this should be ioremaped.
>>
>> OK, sorry, I didn't notice that.
>>
>>> Can you please clarify whether this should be mapped to
>>>
>>> mem or ioremap?
>>
>> The reason I suggested memremap() was that the region you
>> are mapping is being treated as a block of RAM.  Bjorn
>> might know something about this that I don't know...
>>
>> Here's an early LWN article which (at the end) explains
>> why/when one might want to use memremap().
>>    https://lwn.net/Articles/653585/
>> Where I have used it, I pass MEMREMAP_WC as the flag.
> 
> Thanks for sharing this. Will also wait for Bjorn's
> take on this.

As per discussion with Bjorn, since dcc_sram is an io and not DDR 
memory, we should ioremap it. Will be sending out the next version 
accordingly.

> 
>>
>>>>
>>>>> +    if (!buf)
>>>>> +        return -ENOMEM;
>>>>> +
>>>>> +    memcpy(buf, drvdata->ram_base + *ppos, len);
>>>>
>>>> That is, you can simply copy_to_user() into the (user)
>>>> data pointer, from drvdata->ram_base + *ppos.  Maybe
>>>> something like:
>>>>
>>>>      void *src;
>>>>      /* ... */
>>>>
>>>>      src = drvdata->ram_base + *ppos;
>>>>      if (copy_to_user(data, src, len))
>>>>          return -EFAULT;
>>>>
>>>
>>> Ack
>>>
>>>>> +    if (copy_to_user(data, buf, len)) {
>>>>> +        kfree(buf);
>>>>> +        return -EFAULT;
>>>>> +    }
>>>>> +
>>>>> +    *ppos += len;
>>>>> +
>>>>> +    kfree(buf);
>>>>> +
>>>>> +    return len;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations dcc_sram_fops = {
>>>>> +    .owner        = THIS_MODULE,
>>>>> +    .read        = dcc_sram_read,
>>>>> +    .llseek        = no_llseek,
>>>>> +};
>>>>> +
>>>>> +static int dcc_sram_dev_init(struct dcc_drvdata *drvdata)
>>>>> +{
>>>>> +    drvdata->sram_dev.minor = MISC_DYNAMIC_MINOR;
>>>>> +    drvdata->sram_dev.name = DCC_SRAM_NODE;
>>>>> +    drvdata->sram_dev.fops = &dcc_sram_fops;
>>>>> +
>>>>> +    return misc_register(&drvdata->sram_dev);
>>>>> +}
>>>>> +
>>>>> +static void dcc_sram_dev_exit(struct dcc_drvdata *drvdata)
>>>>> +{
>>>>> +    misc_deregister(&drvdata->sram_dev);
>>>>> +}
>>>>> +
>>>>> +static int dcc_probe(struct platform_device *pdev)
>>>>> +{
>>>>> +    u32 val;
>>>>> +    int ret = 0, i;
>>>>> +    struct device *dev = &pdev->dev;
>>>>> +    struct dcc_drvdata *dcc;
>>>>
>>>> Why do you use "dcc" here and "drvdata" elsewhere?
>>>
>>> This was renamed in probe as per prior review comment.
>>
>> I don't know who suggested that (maybe me?), but I guess I
>> prefer using the same (base) name for variables of a given
>> type.  So if you call it "dcc" here, then maybe call it
>> "dcc" everywhere.
>>
>> I haven't looked closely at your patch just now, but it's
>> possible the "struct dcc_drvdata" type could simply be
>> "struct dcc".  That is, a "dcc" structure represents a
>> single "dcc" instance, and you happen to store a copy of
>> that "dcc" pointer as the device's drvdata.
>>
>> Something for you to consider, but this isn't as important
>> a suggestion as a few other comments I've made.
> 
> Ack
> 
>>
>>>>> +    struct resource *res;
>>>>> +
>>>>> +    dcc = devm_kzalloc(dev, sizeof(*dcc), GFP_KERNEL);
>>>>> +    if (!dcc)
>>>>> +        return -ENOMEM;
>>>>> +
>>>>> +    dcc->dev = &pdev->dev;
>>>>> +    platform_set_drvdata(pdev, dcc);
>>>>> +
>>>>> +    dcc->base = devm_platform_ioremap_resource(pdev, 0);
>>>>> +    if (IS_ERR(dcc->base))
>>>>> +        return PTR_ERR(dcc->base);
>>>>> +
>>>>> +    res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>>>>> +    if (!res)
>>>>> +        return -ENODEV;
>>>>> +
>>>>> +    dcc->ram_base = memremap(res->start, resource_size(res), 
>>>>> MEMREMAP_WB);
>>>>> +    if (!dcc->ram_base)
>>>>> +        return -ENODEV;
>>
>> . . .
>>
>>>>> +    /* Either set the fixed loop offset or calculate it
>>>>> +     * from ram_size. Max consecutive addresses the
>>>>> +     * dcc can loop is equivalent to the ram size
>>>>> +     */
>>>>> +    if (val & DCC_LOOP_OFFSET_MASK)
>>>>> +        dcc->loopoff = DCC_FIX_LOOP_OFFSET;
>>>>> +    else
>>>>> +        dcc->loopoff = get_bitmask_order((dcc->ram_size +
>>>>> +                dcc->ram_offset) / 4 - 1);
>>>>
>>>> Here's what I said about the
>   above last time:
>>>>
>>>>    This get_bitmask_order() call to determine the offset of a
>>>>    register seems overly clever.  I think it warrants a little
>>>>    explanation why it's determined by the size of SRAM.
>>>>
>>>> I think part of what confuses me is why you use the sum
>>>> of ram_size and ram_offset.  I suppose 4 is DCC_WORD_SIZE
>>>> but I just don't know.  The comment I was suggesting was
>>>> something about what loopoff actually represents, and why
>>>> it's calculated this way.
>>>
>>> As mentioned in the comment above, the loopoff stands for the max
>>>
>>> consecutive addresses that can be given to the loop instruction. We
>>>
>>> are restricting it as per the total words that can be accomodated in
>>>
>>> the dcc_sram.
>>
>> So you're taking the ram_size + ram_offset, which is the
>> the address just beyond the end of RAM.  (Right?)
>>
>> Then you divide it by 4 (because 4 is the size of a "word"?).
>> To the result would be the end of RAM expressed as "words".
>>
>> Then you subtract 1, which means "last word within RAM".
>>
>> I think there are two things I find confusing:
>> - Why do you use ram_size + ram_offset?  The comment you
>>    added even says "Max consecutive addresses the dcc can
>>    loop is equivalent to the ram size", and that sounds
>>    like the loop_offset calculation should be working
>>    *only* with ram_size.
>> - You call get_bitmask_order() on this value, and I just
>>    don't see how that is related to a loop offset.
>>
>> (Again, I'm not looking closely at the code right now, so
>> maybe I'm just forgetting something about the way this memory
>> is laid out.)
> 
> Yes, this is in conjunction with what is excepted for loop instructions.
> We are setting the most significant bit based on the number of words 
> that can be accomodated inside a dcc_sram(the last word), then using 
> that to compare with the actual loop_cnt that is entered for the loop 
> instructions. Will add further details to my comment for clarity.
> 
> 
> 
>>
>> Thanks.
>>
>>                      -Alex
>>