[PATCHv5 1/4] watchdog: Extend kernel API to know about HW limitations
Yang, Wenyou
Wenyou.Yang at atmel.com
Mon Apr 13 22:44:13 PDT 2015
> -----Original Message-----
> From: Timo Kokkonen [mailto:timo.kokkonen at offcode.fi]
> Sent: 2015年4月14日 13:32
> To: Yang, Wenyou; linux-arm-kernel at lists.infradead.org; linux-
> watchdog at vger.kernel.org; boris.brezillon at free-electrons.com; Ferre, Nicolas;
> alexandre.belloni at free-electrons.com
> Subject: Re: [PATCHv5 1/4] watchdog: Extend kernel API to know about HW
> limitations
>
> On 14.04.2015 05:39, Yang, Wenyou wrote:
> >
> >
> > On 2015/4/13 19:00, Timo Kokkonen wrote:
> >> On 13.04.2015 12:29, Yang, Wenyou wrote:
> >>> Hi Timo,
> >>>
> >>> If the /dev/watchdog device file doesn't open by the user space
> >>> application, the system will reboot by the watchdog.
> >>>
> >>
> >> Thank you for pointing out the bug. When the struct watchdog_device
> >> is initialized, the expires variable is zero. Jiffies however is
> >> negative on boot up, so time_after() comparison with zero returns
> >> false and the worker keeps the watchdog alive for some minutes, until
> >> jiffies become positive. Then it will stop updating the HW as the
> >> expires value has become expired.
> >>
> >>> I think it is missing update of the wdd->expires value. added it as
> >>> following.
> >>
> >> Actually, the expires value stores the user space expiration value.
> >> It should only be updated when user space pings the watchdog. If we
> >> increment the expires value in the worker thread, the watchdog will
> >> never expire even if user space stops pinging it and keep it open.
> >>
> >> So the proper fix is to have it like this (in watchdog_worker function):
> >>
> >> if (time_after(jiffies, wdd->expires) && watchdog_active(wdd)) {
> >> pr_crit("I will reset your machine !\n");
> >> return;
> >> }
> > I have a little doubt, if the watchdog is not opened by the user
> > space, the watchdog will not reset the system in any case.
> >
> > it doesn't work in the same way as before in the at91sam9_wdt.c.
>
> That's why the timer was originally in at91sam9_wdt, it kept on pinging the
> watchdog HW until user space opened it. That's why I started working on the
> early-timeout-sec properlty in the first place. And that is the same pattern that
> repeats all over the watchdog drivers.
>
> >>
> >> Also we need this change to re-start the worker when watchdog is
> >> properly stopped (in watchdog_dev.c):
> >>
> >> @@ -157,9 +159,15 @@ static int watchdog_stop(struct watchdog_device
> >> *wddev)
> >> goto out_stop;
> >> }
> >>
> >> - err = wddev->ops->stop(wddev);
> >> - if (err == 0)
> >> + if (!wddev->hw_max_timeout || watchdog_is_stoppable(wddev)) {
> >> + err = wddev->ops->stop(wddev);
> >> + if (err == 0)
> >> + clear_bit(WDOG_ACTIVE, &wddev->status);
> >> + } else {
> >> + /* Unstoppable watchdogs need the worker to keep them
> >> alive */
> >> clear_bit(WDOG_ACTIVE, &wddev->status);
> >> + schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >> + }
> > Tested it, there is a issue.
> > After open the watchdog device file, and feed the watchdog, then close
> > the watchdog device file, the system reset by the watchdog.
> > It didn't reach here, the watchdog_worker didn't run.
> >
>
> If you just close the device, watchdog is not stopped. That is to
> prevent watchdog daemon crash from hanging the system. You need to feed
> the magic char 'V' in order to stop the watchdog, which will trigger the
> start of the worker. Have you tried that?
Yes, if feed the magic char "V", then close the watchdog device file, it works fine.
>
> I noticed some other issues with the patches. I'll send a new one with
> all fixes combined shortly..
>
> Thanks for your feedback.
>
> -Timo
>
> >>
> >> out_stop:
> >> mutex_unlock(&wddev->lock);
> >>
> >>
> >> After these two modifications the watchdog is being pinged
> >> indefinitely until the user space opens the device and the worker is
> >> re-started if watchdog daemon exits after writing the magic character.
> >> I will update these change to the next version of the patch.
> >>
> >> Thanks again!
> >>
> >> -Timo
> >>
> >>> On 2015/4/10 21:06, Timo Kokkonen wrote:
> >>>> There is a great deal of diversity in the watchdog hardware found on
> >>>> different devices. Differen hardware have different contstraints on
> >>>> them, many of the constraints that are excessively difficult for the
> >>>> user space to satisfy.
> >>>>
> >>>> One such constraint is the length of the timeout value, which in many
> >>>> cases can be just a few seconds. Drivers are creating ad hoc solutions
> >>>> with timers and workqueues to extend the timeout in order to give user
> >>>> space more time between updates. Looking at the drivers it is clear
> >>>> that this has resulted to a lot of duplicate code.
> >>>>
> >>>> Add an extension to the watchdog kernel API that allows the driver to
> >>>> describe tis HW constraints to the watchdog code. A kernel worker in
> >>>> the core is then used to extend the watchdog timeout on behalf of the
> >>>> user space. This allows the drivers to be simplified as core takes
> >>>> over the timer extending.
> >>>>
> >>>> Signed-off-by: Timo Kokkonen <timo.kokkonen at offcode.fi>
> >>>> ---
> >>>> drivers/watchdog/watchdog_core.c | 87
> >>>> ++++++++++++++++++++++++++++++++++++++--
> >>>> drivers/watchdog/watchdog_dev.c | 12 ++++++
> >>>> include/linux/watchdog.h | 23 +++++++++++
> >>>> 3 files changed, 119 insertions(+), 3 deletions(-)
> >>>>
> >>>> diff --git a/drivers/watchdog/watchdog_core.c
> >>>> b/drivers/watchdog/watchdog_core.c
> >>>> index cec9b55..8e7d08d 100644
> >>>> --- a/drivers/watchdog/watchdog_core.c
> >>>> +++ b/drivers/watchdog/watchdog_core.c
> >>>> @@ -99,6 +99,75 @@ int watchdog_init_timeout(struct watchdog_device
> >>>> *wdd,
> >>>> EXPORT_SYMBOL_GPL(watchdog_init_timeout);
> >>>> /**
> >>>> + * watchdog_init_timeout() - initialize generic watchdog parameters
> >>>> + * @wdd: Watchdog device to operate
> >>>> + * @dev: Device that stores the device tree properties
> >>>> + *
> >>>> + * Initialize the generic timeout parameters. The driver needs to set
> >>>> + * hw_features bitmask from @wdd prior calling this function in order
> >>>> + * to ensure the core knows how to handle the HW.
> >>>> + *
> >>>> + * A zero is returned on success and -EINVAL for failure.
> >>>> + */
> >>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
> >>>> *dev)
> >>>> +{
> >>>> + int ret = 0;
> >>>> +
> >>>> + ret = watchdog_init_timeout(wdd, wdd->timeout, dev);
> >>>> + if (ret < 0)
> >>>> + return ret;
> >>>> +
> >>>> + /*
> >>>> + * Max HW timeout needs to be set so that core knows when to
> >>>> + * use a kernel worker to support longer watchdog timeouts
> >>>> + */
> >>>> + if (!wdd->hw_max_timeout)
> >>>> + return -EINVAL;
> >>>> +
> >>>> + return ret;
> >>>> +}
> >>>> +EXPORT_SYMBOL_GPL(watchdog_init_params);
> >>>> +
> >>>> +static void watchdog_worker(struct work_struct *work)
> >>>> +{
> >>>> + struct watchdog_device *wdd = container_of(to_delayed_work(work),
> >>>> + struct watchdog_device, work);
> >>>> +
> >>>> + if (time_after(jiffies, wdd->expires)) {
> >>>> + pr_crit("I will reset your machine !\n");
> >>>> + return;
> >>>> + }
> >>>> +
> >>>> + if ((!watchdog_active(wdd) && !watchdog_is_stoppable(wdd)) ||
> >>>> + (watchdog_active(wdd) &&
> >>>> + wdd->hw_max_timeout < wdd->timeout * HZ)) {
> >>>> + if (wdd->ops->ping)
> >>>> + wdd->ops->ping(wdd);
> >>>> + else
> >>>> + wdd->ops->start(wdd);
> >>>> +
> >>>> + schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> >>> I think you missed the update of wdd->expires, as following.
> >>> wdd->expires = jiffies +
> >>> wdd->timeout * HZ;
> >>>> + }
> >>>> +}
> >>>> +
> >>>> +static int prepare_watchdog(struct watchdog_device *wdd)
> >>>> +{
> >>>> + /* Stop the watchdog now before user space opens the device */
> >>>> + if (wdd->hw_features & WDOG_HW_IS_STOPPABLE &&
> >>>> + wdd->hw_features & WDOG_HW_RUNNING_AT_BOOT) {
> >>>> + wdd->ops->stop(wdd);
> >>>> +
> >>>> + } else if (!(wdd->hw_features & WDOG_HW_IS_STOPPABLE)) {
> >>>> + /*
> >>>> + * Can't stop it, use a kernel timer to tick
> >>>> + * it until it's open by user space
> >>>> + */
> >>>> + schedule_delayed_work(&wdd->work, wdd->hw_heartbeat);
> >>>> + }
> >>> ditto
> >>> wdd->expires = jiffies + wdd->timeout * HZ;
> >>>
> >>>> + return 0;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> * watchdog_register_device() - register a watchdog device
> >>>> * @wdd: watchdog device
> >>>> *
> >>>> @@ -156,13 +225,24 @@ int watchdog_register_device(struct
> >>>> watchdog_device *wdd)
> >>>> wdd->dev = device_create(watchdog_class, wdd->parent, devno,
> >>>> NULL, "watchdog%d", wdd->id);
> >>>> if (IS_ERR(wdd->dev)) {
> >>>> - watchdog_dev_unregister(wdd);
> >>>> - ida_simple_remove(&watchdog_ida, id);
> >>>> ret = PTR_ERR(wdd->dev);
> >>>> - return ret;
> >>>> + goto dev_create_fail;
> >>>> + }
> >>>> +
> >>>> + INIT_DELAYED_WORK(&wdd->work, watchdog_worker);
> >>>> +
> >>>> + if (wdd->hw_max_timeout) {
> >>>> + ret = prepare_watchdog(wdd);
> >>>> + if (ret)
> >>>> + goto dev_create_fail;
> >>>> }
> >>>> return 0;
> >>>> +
> >>>> +dev_create_fail:
> >>>> + watchdog_dev_unregister(wdd);
> >>>> + ida_simple_remove(&watchdog_ida, id);
> >>>> + return ret;
> >>>> }
> >>>> EXPORT_SYMBOL_GPL(watchdog_register_device);
> >>>> @@ -181,6 +261,7 @@ void watchdog_unregister_device(struct
> >>>> watchdog_device *wdd)
> >>>> if (wdd == NULL)
> >>>> return;
> >>>> + cancel_delayed_work_sync(&wdd->work);
> >>>> devno = wdd->cdev.dev;
> >>>> ret = watchdog_dev_unregister(wdd);
> >>>> if (ret)
> >>>> diff --git a/drivers/watchdog/watchdog_dev.c
> >>>> b/drivers/watchdog/watchdog_dev.c
> >>>> index 6aaefba..99eb363 100644
> >>>> --- a/drivers/watchdog/watchdog_dev.c
> >>>> +++ b/drivers/watchdog/watchdog_dev.c
> >>>> @@ -78,6 +78,12 @@ static int watchdog_ping(struct watchdog_device
> >>>> *wddev)
> >>>> else
> >>>> err = wddev->ops->start(wddev); /* restart watchdog */
> >>>> + if (wddev->hw_max_timeout &&
> >>>> + wddev->timeout * HZ > wddev->hw_max_timeout) {
> >>>> + wddev->expires = jiffies + wddev->timeout * HZ;
> >>>> + schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >>>> + }
> >>>> +
> >>>> out_ping:
> >>>> mutex_unlock(&wddev->lock);
> >>>> return err;
> >>>> @@ -110,6 +116,12 @@ static int watchdog_start(struct watchdog_device
> >>>> *wddev)
> >>>> if (err == 0)
> >>>> set_bit(WDOG_ACTIVE, &wddev->status);
> >>>> + if (wddev->hw_max_timeout &&
> >>>> + wddev->timeout * HZ > wddev->hw_max_timeout) {
> >>>> + wddev->expires = jiffies + wddev->timeout * HZ;
> >>>> + schedule_delayed_work(&wddev->work, wddev->hw_heartbeat);
> >>>> + }
> >>>> +
> >>>> out_start:
> >>>> mutex_unlock(&wddev->lock);
> >>>> return err;
> >>>> diff --git a/include/linux/watchdog.h b/include/linux/watchdog.h
> >>>> index 395b70e..c16c921 100644
> >>>> --- a/include/linux/watchdog.h
> >>>> +++ b/include/linux/watchdog.h
> >>>> @@ -12,6 +12,7 @@
> >>>> #include <linux/bitops.h>
> >>>> #include <linux/device.h>
> >>>> #include <linux/cdev.h>
> >>>> +#include <linux/workqueue.h>
> >>>> #include <uapi/linux/watchdog.h>
> >>>> struct watchdog_ops;
> >>>> @@ -62,9 +63,13 @@ struct watchdog_ops {
> >>>> * @timeout: The watchdog devices timeout value.
> >>>> * @min_timeout:The watchdog devices minimum timeout value.
> >>>> * @max_timeout:The watchdog devices maximum timeout value.
> >>>> + * @hw_max_timeout:The watchdog hardware maximum timeout value.
> >>>> + * @hw_margin: Time interval between timer pings.
> >>>> * @driver-data:Pointer to the drivers private data.
> >>>> * @lock: Lock for watchdog core internal use only.
> >>>> + * @work: Worker used to provide longer timeouts than hardware
> >>>> supports.
> >>>> * @status: Field that contains the devices internal status bits.
> >>>> + * @hw_features:Feature bits describing how the watchdog HW works.
> >>>> *
> >>>> * The watchdog_device structure contains all information about a
> >>>> * watchdog timer device.
> >>>> @@ -86,8 +91,12 @@ struct watchdog_device {
> >>>> unsigned int timeout;
> >>>> unsigned int min_timeout;
> >>>> unsigned int max_timeout;
> >>>> + unsigned int hw_max_timeout; /* in jiffies */
> >>>> + unsigned int hw_heartbeat; /* in jiffies */
> >>>> + unsigned long int expires; /* for soft timer */
> >>>> void *driver_data;
> >>>> struct mutex lock;
> >>>> + struct delayed_work work;
> >>>> unsigned long status;
> >>>> /* Bit numbers for status flags */
> >>>> #define WDOG_ACTIVE 0 /* Is the watchdog running/active */
> >>>> @@ -95,6 +104,14 @@ struct watchdog_device {
> >>>> #define WDOG_ALLOW_RELEASE 2 /* Did we receive the magic char
> >>>> ? */
> >>>> #define WDOG_NO_WAY_OUT 3 /* Is 'nowayout' feature set ? */
> >>>> #define WDOG_UNREGISTERED 4 /* Has the device been
> >>>> unregistered */
> >>>> +
> >>>> +/* Bits describing features supported by the HW */
> >>>> + unsigned long hw_features;
> >>>> +
> >>>> +/* Can the watchdog be stopped and started */
> >>>> +#define WDOG_HW_IS_STOPPABLE BIT(0)
> >>>> +/* Is the watchdog already running when the driver starts up */
> >>>> +#define WDOG_HW_RUNNING_AT_BOOT BIT(1)
> >>>> };
> >>>> #define WATCHDOG_NOWAYOUT
> IS_BUILTIN(CONFIG_WATCHDOG_NOWAYOUT)
> >>>> @@ -120,6 +137,11 @@ static inline bool
> >>>> watchdog_timeout_invalid(struct watchdog_device *wdd, unsigne
> >>>> (t < wdd->min_timeout || t > wdd->max_timeout));
> >>>> }
> >>>> +static inline bool watchdog_is_stoppable(struct watchdog_device *wdd)
> >>>> +{
> >>>> + return wdd->hw_features & WDOG_HW_IS_STOPPABLE;
> >>>> +}
> >>>> +
> >>>> /* Use the following functions to manipulate watchdog driver
> >>>> specific data */
> >>>> static inline void watchdog_set_drvdata(struct watchdog_device *wdd,
> >>>> void *data)
> >>>> {
> >>>> @@ -134,6 +156,7 @@ static inline void *watchdog_get_drvdata(struct
> >>>> watchdog_device *wdd)
> >>>> /* drivers/watchdog/watchdog_core.c */
> >>>> extern int watchdog_init_timeout(struct watchdog_device *wdd,
> >>>> unsigned int timeout_parm, struct device *dev);
> >>>> +int watchdog_init_params(struct watchdog_device *wdd, struct device
> >>>> *dev);
> >>>> extern int watchdog_register_device(struct watchdog_device *);
> >>>> extern void watchdog_unregister_device(struct watchdog_device *);
> >>> Best Regards,
> >>> Wenyou Yang
> >>
> > Best Regards,
> > Wenyou Yang
> >
Best Regards,
Wenyou Yang
More information about the linux-arm-kernel
mailing list