[PATCH] watchdog: at91sam9_wdt: various fixes

boris brezillon b.brezillon at overkiz.com
Sun Nov 3 09:42:11 EST 2013


Hello Guenter,

Are you okay with the 3 fixes/improvements provided by this patch,
and the explanation I gave regarding the reason for these changes ?

If so, I will submit a new series splitting the patch and including your 
suggestions.

Best Regards,

Boris

On 29/10/2013 18:22, boris brezillon wrote:
> On 29/10/2013 17:43, Guenter Roeck wrote:
>> On Tue, Oct 29, 2013 at 05:22:50PM +0100, boris brezillon wrote:
>>> On 29/10/2013 16:45, Guenter Roeck wrote:
>>>> On Tue, Oct 29, 2013 at 11:37:33AM +0100, Boris BREZILLON wrote:
>>>>> Fix the secs_to_ticks macro in case 0 is passed as an argument.
>>>>>
>>>>> Rework the heartbeat calculation to increase the security margin 
>>>>> of the
>>>>> watchdog reset timer.
>>>>>
>>>>> Use the min_heartbeat value instead of the calculated heartbeat 
>>>>> value for
>>>>> the first watchdog reset.
>>>>>
>>>>> Signed-off-by: Boris BREZILLON <b.brezillon at overkiz.com>
>>>> Hi Boris,
>>>>
>>>> can you possibly split the three changes into separate patches ?
>>> Sure. My first idea was to split this in 3 patches, but, as the
>>> buggy at91 watchdog series was
>>> already applied to linux-watchdog-next, I thought it would be faster
>>> to only provide one
>>> patch to fix all the issues at once.
>>>
>>>> The first is a no-brainer. Gives my opinion of my code review 
>>>> capabilities
>>>> a slight damper ;-).
>>>>
>>>> For the other two problems, it might make sense to describe
>>>> the problems you are trying to solve.
>>>>
>>>> Couple of comments inline.
>>>>
>>>> Thanks,
>>>> Guenter
>>>>
>>>>
>>>>> ---
>>>>>   drivers/watchdog/at91sam9_wdt.c |   35 
>>>>> ++++++++++++++++++++++++-----------
>>>>>   1 file changed, 24 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/drivers/watchdog/at91sam9_wdt.c 
>>>>> b/drivers/watchdog/at91sam9_wdt.c
>>>>> index 9bd089e..f1b59f1 100644
>>>>> --- a/drivers/watchdog/at91sam9_wdt.c
>>>>> +++ b/drivers/watchdog/at91sam9_wdt.c
>>>>> @@ -51,7 +51,7 @@
>>>>>   #define ticks_to_hz_rounddown(t)    ((((t) + 1) * HZ) >> 8)
>>>>>   #define ticks_to_hz_roundup(t)        (((((t) + 1) * HZ) + 255) 
>>>>> >> 8)
>>>>>   #define ticks_to_secs(t)        (((t) + 1) >> 8)
>>>>> -#define secs_to_ticks(s)        (((s) << 8) - 1)
>>>>> +#define secs_to_ticks(s)        (s ? (((s) << 8) - 1) : 0)
>>>>     (s)
>>>>
>>>>>   #define WDT_MR_RESET    0x3FFF2FFF
>>>>> @@ -61,6 +61,11 @@
>>>>>   /* Watchdog max delta/value in secs */
>>>>>   #define WDT_COUNTER_MAX_SECS ticks_to_secs(WDT_COUNTER_MAX_TICKS)
>>>>> +/* Watchdog heartbeat shift used for security margin:
>>>>> + * we'll try to rshift the heartbeat value with this value to secure
>>>>> + * the watchdog reset. */
>>>>> +#define WDT_HEARTBEAT_SHIFT    2
>>>>> +
>>>>>   /* Hardware timeout in seconds */
>>>>>   #define WDT_HW_TIMEOUT 2
>>>>> @@ -158,7 +163,9 @@ static int at91_wdt_init(struct 
>>>>> platform_device *pdev, struct at91wdt *wdt)
>>>>>       int err;
>>>>>       u32 mask = wdt->mr_mask;
>>>>>       unsigned long min_heartbeat = 1;
>>>>> +    unsigned long max_heartbeat;
>>>>>       struct device *dev = &pdev->dev;
>>>>> +    int shift;
>>>>>       tmp = wdt_read(wdt, AT91_WDT_MR);
>>>>>       if ((tmp & mask) != (wdt->mr & mask)) {
>>>>> @@ -181,23 +188,27 @@ static int at91_wdt_init(struct 
>>>>> platform_device *pdev, struct at91wdt *wdt)
>>>>>       if (delta < value)
>>>>>           min_heartbeat = ticks_to_hz_roundup(value - delta);
>>>>> -    wdt->heartbeat = ticks_to_hz_rounddown(value);
>>>>> -    if (!wdt->heartbeat) {
>>>>> +    max_heartbeat = ticks_to_hz_rounddown(value);
>>>>> +    if (!max_heartbeat) {
>>>>>           dev_err(dev,
>>>>>               "heartbeat is too small for the system to handle it 
>>>>> correctly\n");
>>>>>           return -EINVAL;
>>>>>       }
>>>>> -    if (wdt->heartbeat < min_heartbeat + 4) {
>>>>> +    for (shift = WDT_HEARTBEAT_SHIFT; shift > 0; shift--) {
>>>>> +        if ((max_heartbeat >> shift) < min_heartbeat)
>>>>> +            continue;
>>>>> +
>>>>> +        wdt->heartbeat = max_heartbeat >> shift;
>>>>> +        break;
>>>>> +    }
>>>>> +
>>>>> +    if (!shift)
>>>>>           wdt->heartbeat = min_heartbeat;
>>>> Correct me if I am wrong, but it seems to me that
>>>>
>>>>     if ((max_heartbeat >> 2) >= min_heartbeat)
>>>>          wdt->heartbeat = max_heartbeat >> 2;
>>>>     else if ((max_heartbeat >> 1) >= min_heartbeat)
>>>>         wdt->heartbeat = max_heartbeat >> 1;
>>>>     else
>>>>         wdt->heartbeat = min_heartbeat;
>>>>
>>>> would accomplish the same and be easier to understand.
>>> This is exactly what I'm trying to accomplish.
>>> I used the for loop in case we ever want to change the
>>> WDT_HEARTBEAT_SHIFT value
>>> (which is unlikely to happen).
>>>
>>> I'll move to your proposition.
>>>
>>>> However, given that, I wonder if it is really necessary to bail out 
>>>> above if
>>>> max_heartbeat is 0. After all, you set heartbeat to min_heartbeat 
>>>> anyway
>>>> in this case.
>>> Yes it is necessary. The max_heartbeat is a configuration that
>>> cannot be changed once configured.
>>> We have to inform the user that his max_heartbeat value is too small
>>> to be handled correctly by the Linux kernel.
>>>
>>> If we simply use the min_heartbeat value as heartbeat and ignore the
>>> wrong max_heartbeat value,
>>> the watchdog will reset the SoC before we can ever reset the
>>> watchdog counter.
>>>
>>>>> +
>>>>> +    if (max_heartbeat < min_heartbeat + 4)
>>>>>           dev_warn(dev,
>>>>>                "min heartbeat and max heartbeat might be too close 
>>>>> for the system to handle it correctly\n");
>>>>> -        if (wdt->heartbeat < 4)
>>>>> -            dev_warn(dev,
>>>>> -                 "heartbeat might be too small for the system to 
>>>>> handle it correctly\n");
>>>>> -    } else {
>>>>> -        wdt->heartbeat -= 4;
>>>>> -    }
>>>>>       if ((tmp & AT91_WDT_WDFIEN) && wdt->irq) {
>>>>>           err = request_irq(wdt->irq, wdt_interrupt,
>>>>> @@ -213,7 +224,9 @@ static int at91_wdt_init(struct 
>>>>> platform_device *pdev, struct at91wdt *wdt)
>>>>>                tmp & wdt->mr_mask, wdt->mr & wdt->mr_mask);
>>>>>       setup_timer(&wdt->timer, at91_ping, (unsigned long)wdt);
>>>>> -    mod_timer(&wdt->timer, jiffies + wdt->heartbeat);
>>>>> +    /* Use min_heartbeat the first time because the watchdog 
>>>>> timer might
>>>>> +     * be running for a long time when we reach this init 
>>>>> function. */
>>>>     /*
>>>>      * Multi-line comment style
>>>>      *
>>>>      * Not really sure I understand what this accomplishes. What 
>>>> problem
>>>>      * are you trying to solve here ?
>>>>      */
>>> Sure, I'll change the comment style.
>>>
>>> What, I'm trying to explain, is that the watchdog might (or should)
>>> have been resetted
>>> before loading the linux kernel. But loading the kernel takes some
>>> time (uncompressing,
>>> low level init, ...), and if we take the heartbeat (or max_heartbeat
>>> / 4 in the common case) value as
>>> the next trigger to reset the watchdog counter, the watchdog timer
>>> might have already expired.
>>>
>> But increasing anything in the watchdog driver itself won't help you 
>> there.
>> You can not execute any kernel code before that kernel code is running.
>
> Of course, but you can at least try to minimize the time between the 
> watchdog driver init
> and the first wathdog counter reset.
>
>>> Here is an example:
>>>   - max_heartbeat configured to 8 seconds
>>>   - min_heartbeat configured to 1 second
>>>   => heartbeat = 2s
>>>   - kernel boot time (before at91 watchdog is loaded) = 6 secs
>>>
>> Guess that is where I got lost. Do you mean the time from loading the 
>> driver
>> to starting the watchdog application ? Because the init function is only
>> executed after the driver is loaded, so nothing will help you if it 
>> takes
>> too long for the driver to load.
>
> I think there is another bug here: the driver should not be compiled 
> as a module.
>
> Here is why:
>
> At POR the at91 SoC configure its watchdog with these default values:
>  - enabled
>  - min heartbeat = 0 ticks
>  - max heartbeat = 0xfff ticks <=> 16 secs
>  - some reset options
> After a POR the watchdog can only be reconfigured once (and only once).
> This configuration oftenly takes place in the the bootstrap (or 
> bootloader),
> but can eventually be done by the Linux kernel.
>
> If the watchdog is kept enabled by the bootstrap (or bootloader), this 
> means
> the linux kernel will have to reset the watchdog counter as soon as 
> possible to avoid
> a possible watchdog reset.
>
> => If we authorize the at91 sam9 watchdog to be compiled as a module, 
> we're not sure
> the module will be loaded soon enough to be able to reset the watchdog 
> counter.
>
>>
>> You really have two times to deal with:
>> - Time from ROMMON handoff to loading the driver
>>    Nothing you can do there. If the watchdog fires before the driver 
>> is loaded,
>>    you are lost. Only way t handle this situation is to increase the 
>> timeout
>>    in the ROMMON.
>> - Time from loading driver to watchdog device open. This is really 
>> the time
>>    you are increasing with your change.
>
> This is where it gets a bit tricky.
>
> The heartbeat I'm talking about is not the "user space" heartbeat
> (struct watchdog_device timeout field), it's the "hardware watchdog 
> counter reset"
> heartbeat (struct at91wdt heartbeat field).
>
> Actually the at91 sam9 wdt driver does not provide a direct access to 
> the watchdog reset
> counter functionnality.
> Instead, it periodically reset the watchdog counter (based on the 
> timing config retrieved from
> the hw registers), and eventually, if the user open a the watchdog dev 
> file and fails to reset
> the watchdog (using the user space API), the drivers stops resetting 
> the hw counter, which leads
> to a watchdog reset.
>
> I hope I'm clear enough, cause it's quite complicated to explain.
>
> Best Regards,
>
> Boris
>
>> Thanks,
>> Guenter
>>
>>> If I use the heartbeat value when configuring the first expiration
>>> of the timer,
>>> it might be too late to reset the watchdog counter.
>>>
>>> I'll try to find a proper to explain this use case :-).
>>>
>>>>> +    mod_timer(&wdt->timer, jiffies + min_heartbeat);
>>>>>       /* Try to set timeout from device tree first */
>>>>>       if (watchdog_init_timeout(&wdt->wdd, 0, dev))
>>>>> -- 
>>>>> 1.7.9.5
>>>>>
>>>>> -- 
>>>>> To unsubscribe from this list: send the line "unsubscribe 
>>>>> linux-watchdog" in
>>>>> the body of a message to majordomo at vger.kernel.org
>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>
>>> -- 
>>> To unsubscribe from this list: send the line "unsubscribe 
>>> linux-watchdog" in
>>> the body of a message to majordomo at vger.kernel.org
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>




More information about the linux-arm-kernel mailing list