[PATCH] tests: add a test for mesh gate forwarding

Jouni Malinen j at w1.fi
Sat Mar 5 09:45:54 PST 2016


On Fri, Mar 04, 2016 at 03:06:31PM -0500, Bob Copeland wrote:
> Hmm, none that I can think of, other than mesh support.  I wrote this to
> test some code refactoring I was doing in this area, but it should have
> also worked before the changes.
> 
> I guess it could fail if the 'iw' command didn't work as expected, or if
> there's a timing issue and the gate announcements aren't received.

At least iw is not printing out any errors when it gets executed during
the test.

> With today's wireless-testing, it's passing for me (the "+" is just for
> some whitespace fixes that aren't upstream yet):
> 
> [    0.000000] Linux version 4.5.0-rc6-wt+ (bob at glass) (gcc version 5.3.1 20160101 (Debian 5.3.1-5) ) #14 SMP PREEMPT Fri Mar 4 14:49:21 EST 2016
> 
> Got a different kernel version or config I should try?
> 
> My config is here: http://bobcopeland.com/srcs/vmconfig.2016-03-04.txt

I tried with the current wireless-testing.git snapshot and with the test
case multiple test frames:

    # wait for gate announcement frames
    time.sleep(1)

    # data frame from dev2 -> external sta should be sent to both gates
    dev[2].request("DATA_TEST_CONFIG 1")
    dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2))
    dev[2].request("DATA_TEST_CONFIG 0")
    time.sleep(1)
    dev[2].request("DATA_TEST_CONFIG 1")
    dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2))
    dev[2].request("DATA_TEST_CONFIG 0")
    time.sleep(1)
    dev[2].request("DATA_TEST_CONFIG 1")
    dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2))
    dev[2].request("DATA_TEST_CONFIG 0")
    time.sleep(1)
    dev[2].request("DATA_TEST_CONFIG 1")
    dev[2].request("DATA_TEST_TX {} {} 0".format(external_sta, addr2))
    dev[2].request("DATA_TEST_CONFIG 0")
    time.sleep(0.1)

I do see Path Request messages getting sent for 02:11:22:33:44:55. On
the first attempt with this, I hit a kernel crash:

[   11.200012] Call Trace:
[   11.200012]  <IRQ> 
[   11.200012]  [<ffffffff810687a9>] ? ttwu_do_wakeup+0x19/0xf0
[   11.200012]  [<ffffffff810691d2>] ? try_to_wake_up+0x192/0x3d0
[   11.200012]  [<ffffffff81445980>] ? mesh_nexthop_resolve+0x140/0x140
[   11.200012]  [<ffffffff81445a19>] mesh_path_timer+0x99/0x110
[   11.200012]  [<ffffffff81094705>] call_timer_fn+0x35/0x160
[   11.200012]  [<ffffffff81094a39>] run_timer_softirq+0x209/0x2a0
[   11.200012]  [<ffffffff81445980>] ? mesh_nexthop_resolve+0x140/0x140
[   11.200012]  [<ffffffff8104a9a2>] __do_softirq+0xd2/0x2b0
[   11.200012]  [<ffffffff8104ad9b>] irq_exit+0x7b/0xa0
[   11.200012]  [<ffffffff81461175>] smp_apic_timer_interrupt+0x45/0x60
[   11.200012]  [<ffffffff8145fc02>] apic_timer_interrupt+0x82/0x90
[   11.200012]  <EOI> 
[   11.200012]  [<ffffffff810371f6>] ? native_safe_halt+0x6/0x10
[   11.200012]  [<ffffffff8100c9ce>] default_idle+0x1e/0x100
[   11.200012]  [<ffffffff8100d22f>] arch_cpu_idle+0xf/0x20
[   11.200012]  [<ffffffff8107ac8a>] default_idle_call+0x2a/0x40
[   11.200012]  [<ffffffff8107aef3>] cpu_startup_entry+0x253/0x330
[   11.200012]  [<ffffffff8102ce23>] start_secondary+0x103/0x110
[   11.200012] Code: 1f 44 00 00 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 68 48 89 7d 80 48 8b 47 28 48 89 85 70 ff ff ff 48 8b 80 e8 09 00 00 <48> 8b 40 08 48 85 c0 48 89 85 78 ff ff ff 0f 84 d2 03 00 00 e8 
[   11.200012] RIP  [<ffffffff8144191c>] mesh_path_send_to_gates+0x2c/0x480
[   11.200012]  RSP <ffff88001fd83dd0>
[   11.200012] CR2: 0000000000000008
[   11.200012] ---[ end trace 6fdda66d273fb377 ]---
[   11.200012] Kernel panic - not syncing: Fatal exception in interrupt


This was with my work branch for the kernel with a mesh compilation
warning silenced. When I tried again with unmodified master branch, I
did get the test to pass, but only with that extra time added to the
end. The first Data frame with the mesh extended addresses showed up at
5.6 sec offset from the beginning of the test case, i.e., much later
than the 1 second wait would be able to cover.

Could you please share the wpas_mesh_gate_forwarding.hwsim0.pcapng file
from a test case run that shows the expected behavior?


I don't see how the change I had in net/mac80211/mesh_hwmp.c could have
caused the panic. All it does is initialize a variable:

hwmp_preq_frame_process()
-       u32 orig_sn, target_sn, lifetime, target_metric;
+       u32 orig_sn, target_sn, lifetime, target_metric = 0;



This kernel panic does not happen every time, i.e., I can pass the test
case with my work branch as well. 

The kernel panic hit here:

int mesh_path_send_to_gates(struct mesh_path *mpath)

    tbl = sdata->u.mesh.mesh_paths;
    known_gates = tbl->known_gates;

The crash case looks like this:
[    8.770098] JKM:mesh_path_send_to_gates:tbl=ffff88001eb48e00
[   11.916288] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[   11.931385] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[   11.946013] IPv6: ADDRCONF(NETDEV_UP): wlan2: link is not ready
[   11.970031] JKM:mesh_path_send_to_gates:tbl=          (null)
[   11.971126] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

I.e., the second call to mesh_path_send_to_gates() has
sdata->u.mesh.mesh_paths NULL. Is that broken somewhere else or should
this function check for that NULL case to avoid the crash?

When the test case passes, it happens way before that 11.9 second
offset, but I'm not completely sure what causes the difference between
test runs.

-- 
Jouni Malinen                                            PGP id EFC895FA



More information about the Hostap mailing list