Supplicant Property Change Coalescing Leads to Lost State Changes (was Re: Should Connman Receive All Supplicant State Changes?)

Grant Erickson marathon96
Fri Oct 21 13:35:19 PDT 2011


Following up on this thread:

	http://lists.connman.net/pipermail/connman/2011-July/005641.html

and this one:

	http://lists.shmoo.com/pipermail/hostap/2010-September/021714.html

I have further explored this and after applying this patch series:

	http://lists.connman.net/pipermail/connman/2011-October/006694.html

I confirmed that it can take 5, 10, 20 cycles or more until connman finally sees a 4WAY_HANDSHAKE -> DISCONNECTED transition and successfully disables the network.

I've also seen instances where connman fails to see a COMPLETED state transition because it gets smashed, for instance, by a SCANNING state transition (see another example below).

Because the property getters don't run and DBus messages aren't populated until wpa_dbus_flush_object_changed_properties is called:

	wpa_supplicant_set_state                                        wpa_supplicant.c
	 |
	 '- wpas_notify_state_changed                                   notify.c
		 |
		 |- wpa_supplicant_dbus_notify_state_change                 dbus_old.c
		 |   |
		 |   |- dbus_message_new_signal
		 |   |- wpa_supplicant_state_txt
		 |   |- wpa_supplicant_state_txt
		 |   |- dbus_message_append_args
		 |   '- dbus_connection_send
		 |
		 '- wpas_dbus_signal_prop_changed                           dbus_new.c
			 |
			 |- wpa_dbus_mark_property_changed
			 '- flush_object_timeout_handler
				 |
				 '- wpa_dbus_flush_object_changed_properties        dbus_new_helpers.c
					 |
					 '- send_prop_changed_signal                    dbus_new_helpers.c
						 |
						 |- dbus_message_new_signal
						 |- dbus_message_iter_init_append
						 |- dbus_message_iter_open_container
						 |- put_changed_properties                  dbus_new_helpers.c
						 |- dbus_message_iter_close_container
						 '- dbus_connection_send 

even doing an immediate flush when there's a property change conflict will still result in lost state changes. A representative example:

    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: State: 4WAY_HANDSHAKE -> 4WAY_HANDSHAKE
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: WPA: RX message 3 of 4-Way Handshake from 00:19:5b:60:1d:f3 (ver=2)
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: WPA: Sending EAPOL-Key 4/4
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: WPA: Installing PTK to the driver.
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: wpa_driver_wext_set_key: alg=3 key_idx=0 set_tx=1 seq_len=6 key_len=16
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: External notification - portValid=1
 .- 2011-10-21 18:29:16.132171 daemon.debug wpa_supplicant[1545]: State: 4WAY_HANDSHAKE -> GROUP_HANDSHAKE
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: -> wpa_dbus_mark_property_changed
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     iface 0x71308 path /fi/w1/wpa_supplicant1/Interfaces/0 interface fi.w1.wpa_supplicant1.Interface property State
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** obj_desc 0x828d8 obj_desc->prop_changed_flags 0x82900 ***
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** obj_desc->prop_changed_flags[1] = 0 ***
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** object property timeout is NOT registered ***
 |
 |--- 417.847 ms.
 |
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: <- wpa_dbus_mark_property_changed
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: WPA: Installing GTK to the driver (keyidx=1 tx=0 len=32).
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: wpa_driver_wext_set_key: alg=2 key_idx=1 set_tx=0 seq_len=6 key_len=32
 |  2011-10-21 18:29:16.000000 daemon.notice wpa_supplicant[1545]: WPA: Key negotiation completed with 00:19:5b:60:1d:f3 [PTK=CCMP GTK=TKIP]
 |  2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: Cancelling authentication timeout
 '- 2011-10-21 18:29:16.550018 daemon.debug wpa_supplicant[1545]: State: GROUP_HANDSHAKE -> COMPLETED
    
    We just lost the GROUP_HANDSHAKE state change; wpa_s->wpa_state is now COMPLETED.
    
    2011-10-21 18:29:16.000000 daemon.notice wpa_supplicant[1545]: CTRL-EVENT-CONNECTED - Connection to 00:19:5b:60:1d:f3 completed (auth) [id=0 id_str=]
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: wpa_driver_wext_set_operstate: operstate 0->1 (UP)
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: netlink: Operstate: linkmode=-1, operstate=6
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: wlan0 {RX} 712 packets 75152 bytes
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: -> wpa_dbus_mark_property_changed
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: wlan0 {TX} 842 packets 125084 bytes
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     iface 0x71308 path /fi/w1/wpa_supplicant1/Interfaces/0 interface fi.w1.wpa_supplicant1.Interface property State
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: wlan0 {update} flags 69699 <UP,RUNNING,LOWER_UP>
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** obj_desc 0x828d8 obj_desc->prop_changed_flags 0x82900 ***
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: wlan0 {newlink} index 2 address 00:0c:29:4c:56:a2 mtu 1500
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** obj_desc->prop_changed_flags[1] = 1 ***
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: wlan0 {newlink} index 2 operstate 6 <UP>
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** property State already has a pending change ***
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     *** object property timeout is NOT registered ***
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: <- wpa_dbus_mark_property_changed
    
    Even if we perform an immediate flush here due to the conflict, the state that will be flushed is
    COMPLETED as that's what the wpas_dbus_getter_state will read from the global data structure.
    
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: External notification - portValid=1
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: External notification - EAP success=1
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: SUPP_PAE entering state AUTHENTICATING
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: SUPP_BE entering state SUCCESS
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAP: EAP entering state DISABLED
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: SUPP_PAE entering state AUTHENTICATED
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: Supplicant port status: Authorized
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL: SUPP_BE entering state IDLE
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: EAPOL authentication completed successfully

    The flush runs and out goes a SCN for COMPLETED.

    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: dbus: flush_object_timeout_handler: Timeout - sending changed properties of object /fi/w1/wpa_supplicant1/Interfaces/0
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: -> wpa_dbus_flush_object_changed_properties
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]:     con 0x71aa0 path /fi/w1/wpa_supplicant1/Interfaces/0
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: <- wpa_dbus_flush_object_changed_properties
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: RTM_NEWLINK: operstate=1 ifi_flags=0x11003 ([UP][LOWER_UP])
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: netlink: Operstate: linkmode=-1, operstate=6
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: DHCP: switch listening mode (0 ==> 1)
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: RTM_NEWLINK, IFLA_IFNAME: Interface 'wlan0' added
    2011-10-21 18:29:16.000000 daemon.info connmand[1561]: DHCP: DISCOVER on wlan0 to 255.255.255.255 port 67 interval 3
    2011-10-21 18:29:16.000000 daemon.debug wpa_supplicant[1545]: Wireless event: cmd=0x8c02 len=164
    2011-10-21 18:29:16.000000 daemon.debug connmand[1561]: state completed (9)

This might be mitigated by either:

	1) Special-case state changes, fast tracking them to a direct send_prop_changed_signal.

	2) Make WPA_DBUS_SEND_PROP_CHANGED_TIMEOUT exceptionally small, say 250 ms.

Strong opinions? Both are apt to both simultaneously fix and break existing packages (e.g. connman, flimflam).

Best,

Grant



More information about the Hostap mailing list