[PATCH] rtnetlink: add rtnl_lock debug log

Rocco.Yue rocco.yue at mediatek.com
Sat May 8 01:54:34 PDT 2021


> > +static void rtnl_print_btrace(struct timer_list *unused)
> > +{
> > +	pr_info("----------- %s start -----------\n", __func__);
> > +	pr_info("%s[%d][%c] hold rtnl_lock more than 2 sec, start time: %llu\n",
> > +		rtnl_instance.task->comm,
> > +		rtnl_instance.pid,
> > +		task_state_to_char(rtnl_instance.task),
> > +		rtnl_instance.start_time);
> > +	stack_trace_print(rtnl_instance.addrs, rtnl_instance.nr_entries, 0);
> 
> Do we want to print same traces every 2 seconds? 
> 
> Since it is possible to stall for e.g. 30 seconds, printing either only upon
> first call to rtnl_print_btrace() for each stalling duration or only upon
> end of stalling duration (i.e. from rtnl_relase_btrace()) is better?
> 

Thanks for your review, Tetsuo.

rtnl_chk_timer is a one-shot timer, I don't rearm the timer after
expire, therefore traces are only logged once after rtnl_lock was held
for more than 2 seconds, rather than every 2 seconds. In this way, even
if this rtnl_lock debug mechanism is enabled, the overhead is also
affordable in the user load.

As far as I understand it, there won't be the problem you mentioned.

In my experience, it's more appropriate to print trace when
call rtnl_print_btrace(), we can't expect that rtnl_lock will be
released before kernel reboot. And after holding the rtnl_lock for more
than 2s, we will print these logs, which is more helpful for us.

> > +	show_stack(rtnl_instance.task, NULL, KERN_DEBUG);
> 
> Why KERN_DEBUG ?
> 
> If you retrieve the output via dmesg, KERN_DEBUG would be fine.
> But for syzkaller (which counts on printk() messages being printed to
> consoles), KERN_INFO (or default) is expected.
> 

KERN_INFO is expected, I will fix it in patch v2.

> > +	pr_info("------------ %s end -----------\n", __func__);
> > +}
> > +
> > +static void rtnl_relase_btrace(void)
> > +{
> > +	rtnl_instance.end_time = sched_clock();
> > +
> 
> You should del_timer_sync() here than
> 
> > +	if (rtnl_instance.end_time - rtnl_instance.start_time > 2000000000ULL) {
> > +		pr_info("rtnl_lock is held by [%d] from [%llu] to [%llu]\n",
> > +			rtnl_instance.pid,
> > +			rtnl_instance.start_time,
> > +			rtnl_instance.end_time);
> > +	}
> > +
> > +	del_timer(&rtnl_chk_timer);
> 
> here in order to make sure that end message is printed only after
> rtnl_print_btrace() messages are printed.
> 

Will have a separate patch for this.

> > +}
> > +#endif
> > +
> 

Best Regards
Rocco



More information about the Linux-mediatek mailing list