[LEDE-DEV] [PATCH] libubus: initialize list member for ubus_auto_conn's timer
Alexandru Ardelean
ardeleanalex at gmail.com
Tue Dec 13 00:04:54 PST 2016
On Tue, Dec 13, 2016 at 9:52 AM, Felix Fietkau <nbd at nbd.name> wrote:
> On 2016-12-13 08:41, Alexandru Ardelean wrote:
>> Every once in a while, we'll get stacktrace:
>> ```
>> (gdb) bt
>> \#0 0xb7bc4668 in _list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:83
>> \#1 list_del (entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/list.h:90
>> \#2 uloop_timeout_cancel (timeout=timeout at entry=0x10015688 <conn+116>) at /home/sandu/work/Wrt/openwrt/build_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/libubox-2015-06-29/uloop.c:474
>> \#3 0x10003794 in ubus_auto_shutdown (conn=0x10015614 <conn>) at /home/sandu/work/Wrt/openwrt/staging_dir/target-powerpc_8540_uClibc-0.9.33.2_ewok/usr/include/libubus.h:249
>> \#4 module_ubus_fini () at ubus.c:64
>> \#5 0x100013fc in main (argc=<optimized out>, argv=<optimized out>) at module.c:128
>> ```
>>
>> In our code, `ubus_auto_connect()` is called, then due to some logic,
>> the module has to quickly stop, calling ubus_auto_shutdown().
>>
>> It seems that there is case in where the `timer` timeout, is not yet
>> registered with the internal list of timeouts from uloop, which
>> ends up trying to delete an invalid list.
>>
>> So, one solution is to init the list element of the `timer` of
>> the `ubus_auto_conn` struct.
>>
>> Signed-off-by: Alexandru Ardelean <ardeleanalex at gmail.com>
> I think this patch is a workaround that simply hides the real issue.
> uloop_timeout_cancel will only call list_del if the timer 'pending' flag
> is set.
>
> It seems to me that you might be using ubus_auto_connect on
> uninitialized memory. If that's the case, you might run into issues in
> other places as well, and you should just add a memset to your application.
>
> - Felix
Hmmm, you're right.
It is only one module I'm seeing this in.
Maybe it's some weird memory corruption, or race ; since that
ubus_auto_conn struct is init-ed to 0 [since it's static].
Will dig deeper into that.
Sorry for the noise.
More information about the Lede-dev
mailing list