[PATCH] clkdev: report over-sized strings when creating clkdev entries

Tue May 21 23:53:18 PDT 2024

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 18.05.24 09:01, Russell King (Oracle) wrote:
> On Fri, May 17, 2024 at 08:24:19PM -0700, Guenter Roeck wrote:
>> On 5/17/24 16:37, Russell King (Oracle) wrote:
>>> On Fri, May 17, 2024 at 04:34:06PM -0700, Guenter Roeck wrote:
>>>> On 5/17/24 15:22, Russell King (Oracle) wrote:
>>>>> On Fri, May 17, 2024 at 03:09:12PM -0700, Guenter Roeck wrote:
>>>>>> On Fri, Mar 15, 2024 at 11:47:55AM +0000, Russell King (Oracle) wrote:
>>>>>>> Report an error when an attempt to register a clkdev entry results in a
>>>>>>> truncated string so the problem can be easily spotted.
>>>>>>>
>>>>>>> Reported by: Duanqiang Wen <duanqiangwen at net-swift.com>
>>>>>>> Signed-off-by: Russell King (Oracle) <rmk+kernel at armlinux.org.uk>
>>>>>>> Reviewed-by: Stephen Boyd <sboyd at kernel.org>
>>>>>>
>>>>>> With this patch in the mainline kernel, I get
>>>>>>
>>>>>> 10000000.clock-controller:corepll: device ID is greater than 24
>>>>>> sifive-clk-prci 10000000.clock-controller: Failed to register clkdev for corepll: -12
>>>>>> sifive-clk-prci 10000000.clock-controller: could not register clocks: -12
>>>>>> sifive-clk-prci 10000000.clock-controller: probe with driver sifive-clk-prci failed with error -12
>>>>>> ...
>>>>>> platform 10060000.gpio: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10010000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10011000.serial: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10040000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10050000.spi: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>> platform 10090000.ethernet: deferred probe pending: platform: supplier 10000000.clock-controller not ready
>>>>>>
>>>>>> when trying to boot sifive_u in qemu.
>>>>>>
>>>>>> Apparently, "10000000.clock-controller" is too long. Any suggestion on
>>>>>> how to solve the problem ? I guess using dev_name(dev) as dev_id parameter
>>>>>> for clk_hw_register_clkdev() is not or no longer a good idea.
>>>>>> What else should be used instead ?
>>>>>
>>>>> It was *never* a good idea. clkdev uses a fixed buffer size of 20
>>>>> characters including the NUL character, and "10000000.clock-controller"
>>>>> would have been silently truncated to "10000000.clock-cont", and thus
>>>>>
>>>>>                           if (!dev_id || strcmp(p->dev_id, dev_id))
>>>>>
>>>>> would never have matched.
>>>>>
>>>>> We need to think about (a) whether your use of clk_hw_register_clkdev()
>>>>> is still appropriate, and (b) whether we need to increase the size of
>>>>> the strings.
>>>>
>>>> It isn't _my_ use, really. I only run a variety of boot tests with qemu.
>>>> I expect we'll see reports from others trying to boot the mainline kernel
>>>> on real sifive_u hardware or other hardware using the same driver or other
>>>> drivers using dev_name() as dev_id parameter. Coccinelle finds the
>>>> following callers:
>>>
>>> Using dev_name() is not an issue. It's when dev_name() exceeds 19
>>> characters that it becomes an issue (and always has been an issue
>>> due to the truncation.) clk_get(dev, ...) uses dev_name(dev) to match
>>> against its entry in the table.
>>>
>>> As I say, dev_name() itself is not an issue. The length used for the
>>> name is.
>>>
>>
>> Maybe, but the existence of best_dev_name() suggests that this has been seen
>> before and that, as you mentioned, it is not a good idea. Anyway, the patch
>> below fixes the problem for me. I don't know if it is acceptable / correct,
>> so it might serve as guidance for others when fixing the problem for real.
> 
> I get the impression that there's a communication problem here, so I'm
> not going to continue replying. Thanks.

Hmmm. Communication problem aside, this in the end seems to be a
regression that is caused by a change of yours. Maybe not a major one
that is making a fuzz about, but still one that would be good to get
fixed. So who will take care of that? Alternatively:

Guenter, I assume we could simply fix this by reverting the culprit if
Linus wanted to?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke