[PATCH V17 2/7] soc: qcom: dcc: Add driver support for Data Capture and Compare unit(DCC)
Souradeep Chowdhury
quic_schowdhu at quicinc.com
Tue Nov 8 21:40:49 PST 2022
On 11/7/2022 11:12 AM, Souradeep Chowdhury wrote:
>
>
> On 11/1/2022 12:21 AM, Alex Elder wrote:
>> On 10/21/22 2:14 AM, Souradeep Chowdhury wrote:
>>>
>>>
>>> On 10/21/2022 5:37 AM, Alex Elder wrote:
>>>> On 10/14/22 1:00 AM, Souradeep Chowdhury wrote:
>>>>> The DCC is a DMA Engine designed to capture and store data
>>>>> during system crash or software triggers. The DCC operates
>>>>> based on user inputs via the debugfs interface. The user gives
>>>>> addresses as inputs and these addresses are stored in the
>>>>> dcc sram. In case of a system crash or a manual software
>>>>> trigger by the user through the debugfs interface,
>>>>> the dcc captures and stores the values at these addresses.
>>>>> This patch contains the driver which has all the methods
>>>>> pertaining to the debugfs interface, auxiliary functions to
>>>>> support all the four fundamental operations of dcc namely
>>>>> read, write, read/modify/write and loop. The probe method
>>>>> here instantiates all the resources necessary for dcc to
>>>>> operate mainly the dedicated dcc sram where it stores the
>>>>> values. The DCC driver can be used for debugging purposes
>>>>> without going for a reboot since it can perform software
>>>>> triggers as well based on user inputs.
>>>>>
>>>>> Also added the documentation for debugfs entries and explained
>>>>> the functionalities of each debugfs file that has been created
>>>>> for dcc.
>>>>>
>>>>> The following is the justification of using debugfs interface
>>>>> over the other alternatives like sysfs/ioctls
>>>>>
>>>>> i) As can be seen from the debugfs attribute descriptions,
>>>>> some of the debugfs attribute files here contains multiple
>>>>> arguments which needs to be accepted from the user. This goes
>>>>> against the design style of sysfs.
>>>>>
>>>>> ii) The user input patterns have been made simple and convenient
>>>>> in this case with the use of debugfs interface as user doesn't
>>>>> need to shuffle between different files to execute one instruction
>>>>> as was the case on using other alternatives.
>>>>>
>>>>> Signed-off-by: Souradeep Chowdhury <quic_schowdhu at quicinc.com>
>>>>
>>>> I haven't followed any review feedback you have received
>>>> since verion 8 (which I reviewed), so if I say something
>>>> that conflicts with other feedback I apologize. I know
>>>> Bjorn had some comments too, so you're already going to
>>>> send another version.
>>>>
>>>> Unfortunately I have some more input, including some things
>>>> that are basically bugs (because buffers could be overrun).
>>>> I will plan to review again once you've had a chance to
>>>> address my comments.
>>>>
>>>> -Alex
>>>
>>> Thanks for the review. Will be sending out the next version
>>> implementing Bjorn's and your comments.
>>
>> Sorry for my delayed response. Your message didn't show up
>> in my "normal" mail box so I'm catching up now.
>>
>> . . .
>>
>>>>> diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile
>>>>> index d66604a..b1fe812 100644
>>>>> --- a/drivers/soc/qcom/Makefile
>>>>> +++ b/drivers/soc/qcom/Makefile
>>>>> @@ -4,6 +4,7 @@ obj-$(CONFIG_QCOM_AOSS_QMP) += qcom_aoss.o
>>>>> obj-$(CONFIG_QCOM_GENI_SE) += qcom-geni-se.o
>>>>> obj-$(CONFIG_QCOM_COMMAND_DB) += cmd-db.o
>>>>> obj-$(CONFIG_QCOM_CPR) += cpr.o
>>>>> +obj-$(CONFIG_QCOM_DCC) += dcc.o
>>>>> obj-$(CONFIG_QCOM_GSBI) += qcom_gsbi.o
>>>>> obj-$(CONFIG_QCOM_MDT_LOADER) += mdt_loader.o
>>>>> obj-$(CONFIG_QCOM_OCMEM) += ocmem.o
>>>>> diff --git a/drivers/soc/qcom/dcc.c b/drivers/soc/qcom/dcc.c
>>>>> new file mode 100644
>>>>> index 0000000..efad225
>>>>> --- /dev/null
>>>>> +++ b/drivers/soc/qcom/dcc.c
>>
>> . . .
>>
>>>> Then you use DCC_ADDR_RANGE_MASK to truncate an address
>>>> provided down to a multiple of 16 bytes. Why is that?
>>>> Is there a hardware limitation that makes 16 byte alignment
>>>> necessary? (A little more below, where they're used.)
>>>
>>> Yes,this is necessary as per dcc_sram hardware configuraton.
>>
>> OK. I assumed that, but it's worth mentioning that
>> somewhere (perhaps you already did, and I just missed it).
>
> Ack
>
>>
>> . . .
>>
>>>> I have some questions about the way memory regions
>>>> are defined here.
>>>>
>>>> - You round down the address using DCC_ADDR_RANGE_MASK.
>>>> Is that because the address has an alignment requirement?
>>>> - DCC_ADDR_RANGE_MASK is 0xfffffff0, meaning it's 16-byte
>>>> aligned. Is that the required alignment? (It is more
>>>> strict than the 32-bit word size.)
>>>> - Is there any requirement on the size (in bytes)? I.e.,
>>>> does it need to be 16-byte aligned? (You multiply the
>>>> count by 4, which I presume is sizeof(u32), the word size.)
>>>> - If the base address is affected by rounding down like
>>>> this, you aren't updating the length, which it seems
>>>> could omit a word at the end of the desired range.
>>>> - You are checking to be sure the word count doesn't exceed
>>>> the RAM size. But you're using DCC_ADDR_OFF_RANGE=8,
>>>> even though you said that a "word" is 32 bits.
>>>
>>> The check for the DCC_ADDR_OFF_RANGE=8 is to give an arbitrary
>>> restriction in word length for the dcc configuration but ideally it
>>> should be 4 as dcc sram word length is 4, will be changing this
>>> accordingly.
>>
>> I think that will be clearer. Using the word length avoids
>> any need to explain why 8 was being used.
>
> Ack
>
>>
>>> Also the base address alignment requirement is consistent as per the
>>> DCC hardware specification. The address range has to be 16 byte
>>> aligned.
>>
>> So you're saying the size in bytes also has this requirement?
>> If so, then it's good you'll enforce it.
>
> Ack
>
>>
>>>>
>>>>> + if (!len || len > drvdata->ram_size / DCC_ADDR_OFF_RANGE) {
>>>>> + dev_err(drvdata->dev, "DCC: Invalid length\n");
>>>>> + ret = -EINVAL;
>>>>> + goto out_unlock;
>>>>> + }
>>>>> +
>>>>> + base = addr & DCC_ADDR_RANGE_MASK;
>>>> Maybe:
>>>> base = round_down(addr, DCC_WORD_SIZE);
>>>>
>>>> Then you don't even need DCC_ADDR_RANGE_MASK.
>>>>
>>>> And then:
>>>> len += base - addr;
>>>> And if necessary:
>>>> len = round_up(addr, DCC_WORD_SIZE);
>>>> And finally:
>>>> if (len > drvdata->ram_size / DCC_WORD_SIZE)
>>>> return -EINVAL;
>>>
>>> Ack
>>
>> . . .
>>
>>>>> + if (ret)
>>>>> + return -EFAULT;
>>>>> + if (count > sizeof(buf) || count == 0)
>>>>> + return -EINVAL;
>>>>> +
>>>>> + curr_list = dcc_filp_curr_list(filp);
>>>>> + if (curr_list < 0)
>>>>> + return curr_list;
>>>>> +
>>>>> + if (buf[count - 1] == '\n')
>>>>> + buf[count - 1] = '\0';
>>>>> + else
>>>>> + return -EINVAL;
>>>> Why is it important for the input buffer to end in newline?
>>>
>>> We are using the newline to convert the input buffer into a string
>>>
>>> for strsep operations.
>>
>> But strsep() returns the entire string if it finds the '\0'
>> before finding any of the delimiters. So the effect should
>> be the same. It's possible I'm misunderstanding but I think
>> there's no need for this check at all.
>
> Ack
>
>>
>>>>> + /* EOF check */
>>>>> + if (*ppos >= drvdata->ram_size)
>>>>> + return 0;
>>>>> +
>>>>> + if ((*ppos + len) > drvdata->ram_size)
>>>>> + len = (drvdata->ram_size - *ppos);
>>>>> +
>>>>> + buf = kzalloc(len, GFP_KERNEL);
>>>>
>>>> Now that you are using memremap() rather than ioremap()
>>>> for the ram_base memory, I don't think you have any need
>>>> to allocate a buffer here anymore.
>>>
>>> Ack. As per Bjorn's comments this should be ioremaped.
>>
>> OK, sorry, I didn't notice that.
>>
>>> Can you please clarify whether this should be mapped to
>>>
>>> mem or ioremap?
>>
>> The reason I suggested memremap() was that the region you
>> are mapping is being treated as a block of RAM. Bjorn
>> might know something about this that I don't know...
>>
>> Here's an early LWN article which (at the end) explains
>> why/when one might want to use memremap().
>> https://lwn.net/Articles/653585/
>> Where I have used it, I pass MEMREMAP_WC as the flag.
>
> Thanks for sharing this. Will also wait for Bjorn's
> take on this.
As per discussion with Bjorn, since dcc_sram is an io and not DDR
memory, we should ioremap it. Will be sending out the next version
accordingly.
>
>>
>>>>
>>>>> + if (!buf)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + memcpy(buf, drvdata->ram_base + *ppos, len);
>>>>
>>>> That is, you can simply copy_to_user() into the (user)
>>>> data pointer, from drvdata->ram_base + *ppos. Maybe
>>>> something like:
>>>>
>>>> void *src;
>>>> /* ... */
>>>>
>>>> src = drvdata->ram_base + *ppos;
>>>> if (copy_to_user(data, src, len))
>>>> return -EFAULT;
>>>>
>>>
>>> Ack
>>>
>>>>> + if (copy_to_user(data, buf, len)) {
>>>>> + kfree(buf);
>>>>> + return -EFAULT;
>>>>> + }
>>>>> +
>>>>> + *ppos += len;
>>>>> +
>>>>> + kfree(buf);
>>>>> +
>>>>> + return len;
>>>>> +}
>>>>> +
>>>>> +static const struct file_operations dcc_sram_fops = {
>>>>> + .owner = THIS_MODULE,
>>>>> + .read = dcc_sram_read,
>>>>> + .llseek = no_llseek,
>>>>> +};
>>>>> +
>>>>> +static int dcc_sram_dev_init(struct dcc_drvdata *drvdata)
>>>>> +{
>>>>> + drvdata->sram_dev.minor = MISC_DYNAMIC_MINOR;
>>>>> + drvdata->sram_dev.name = DCC_SRAM_NODE;
>>>>> + drvdata->sram_dev.fops = &dcc_sram_fops;
>>>>> +
>>>>> + return misc_register(&drvdata->sram_dev);
>>>>> +}
>>>>> +
>>>>> +static void dcc_sram_dev_exit(struct dcc_drvdata *drvdata)
>>>>> +{
>>>>> + misc_deregister(&drvdata->sram_dev);
>>>>> +}
>>>>> +
>>>>> +static int dcc_probe(struct platform_device *pdev)
>>>>> +{
>>>>> + u32 val;
>>>>> + int ret = 0, i;
>>>>> + struct device *dev = &pdev->dev;
>>>>> + struct dcc_drvdata *dcc;
>>>>
>>>> Why do you use "dcc" here and "drvdata" elsewhere?
>>>
>>> This was renamed in probe as per prior review comment.
>>
>> I don't know who suggested that (maybe me?), but I guess I
>> prefer using the same (base) name for variables of a given
>> type. So if you call it "dcc" here, then maybe call it
>> "dcc" everywhere.
>>
>> I haven't looked closely at your patch just now, but it's
>> possible the "struct dcc_drvdata" type could simply be
>> "struct dcc". That is, a "dcc" structure represents a
>> single "dcc" instance, and you happen to store a copy of
>> that "dcc" pointer as the device's drvdata.
>>
>> Something for you to consider, but this isn't as important
>> a suggestion as a few other comments I've made.
>
> Ack
>
>>
>>>>> + struct resource *res;
>>>>> +
>>>>> + dcc = devm_kzalloc(dev, sizeof(*dcc), GFP_KERNEL);
>>>>> + if (!dcc)
>>>>> + return -ENOMEM;
>>>>> +
>>>>> + dcc->dev = &pdev->dev;
>>>>> + platform_set_drvdata(pdev, dcc);
>>>>> +
>>>>> + dcc->base = devm_platform_ioremap_resource(pdev, 0);
>>>>> + if (IS_ERR(dcc->base))
>>>>> + return PTR_ERR(dcc->base);
>>>>> +
>>>>> + res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>>>>> + if (!res)
>>>>> + return -ENODEV;
>>>>> +
>>>>> + dcc->ram_base = memremap(res->start, resource_size(res),
>>>>> MEMREMAP_WB);
>>>>> + if (!dcc->ram_base)
>>>>> + return -ENODEV;
>>
>> . . .
>>
>>>>> + /* Either set the fixed loop offset or calculate it
>>>>> + * from ram_size. Max consecutive addresses the
>>>>> + * dcc can loop is equivalent to the ram size
>>>>> + */
>>>>> + if (val & DCC_LOOP_OFFSET_MASK)
>>>>> + dcc->loopoff = DCC_FIX_LOOP_OFFSET;
>>>>> + else
>>>>> + dcc->loopoff = get_bitmask_order((dcc->ram_size +
>>>>> + dcc->ram_offset) / 4 - 1);
>>>>
>>>> Here's what I said about the
> above last time:
>>>>
>>>> This get_bitmask_order() call to determine the offset of a
>>>> register seems overly clever. I think it warrants a little
>>>> explanation why it's determined by the size of SRAM.
>>>>
>>>> I think part of what confuses me is why you use the sum
>>>> of ram_size and ram_offset. I suppose 4 is DCC_WORD_SIZE
>>>> but I just don't know. The comment I was suggesting was
>>>> something about what loopoff actually represents, and why
>>>> it's calculated this way.
>>>
>>> As mentioned in the comment above, the loopoff stands for the max
>>>
>>> consecutive addresses that can be given to the loop instruction. We
>>>
>>> are restricting it as per the total words that can be accomodated in
>>>
>>> the dcc_sram.
>>
>> So you're taking the ram_size + ram_offset, which is the
>> the address just beyond the end of RAM. (Right?)
>>
>> Then you divide it by 4 (because 4 is the size of a "word"?).
>> To the result would be the end of RAM expressed as "words".
>>
>> Then you subtract 1, which means "last word within RAM".
>>
>> I think there are two things I find confusing:
>> - Why do you use ram_size + ram_offset? The comment you
>> added even says "Max consecutive addresses the dcc can
>> loop is equivalent to the ram size", and that sounds
>> like the loop_offset calculation should be working
>> *only* with ram_size.
>> - You call get_bitmask_order() on this value, and I just
>> don't see how that is related to a loop offset.
>>
>> (Again, I'm not looking closely at the code right now, so
>> maybe I'm just forgetting something about the way this memory
>> is laid out.)
>
> Yes, this is in conjunction with what is excepted for loop instructions.
> We are setting the most significant bit based on the number of words
> that can be accomodated inside a dcc_sram(the last word), then using
> that to compare with the actual loop_cnt that is entered for the loop
> instructions. Will add further details to my comment for clarity.
>
>
>
>>
>> Thanks.
>>
>> -Alex
>>
More information about the linux-arm-kernel
mailing list