[wpa_supplicant] essid with non-ascii characters
Tue Jul 31 15:01:13 PDT 2012
On Tue, 2012-07-31 at 11:11 +0200, Jouke Witteveen wrote:
> On Mon, Jul 30, 2012 at 6:15 PM, Jouni Malinen <j at w1.fi> wrote:
> > On Mon, Jul 30, 2012 at 02:10:51PM +0200, Jouke Witteveen wrote:
> >> The network is supposedly named "Wifi Ch?teau d'Olonne", thus the
> >> experiment shows that wpa_cli substitutes the '?' with a '_'.
> > If it happens to be encoded as a single character in the SSID.. It could
> > also end up showing up as "__" if multi-byte encoding was used.
> >> If it
> >> would just output whatever bytes are in the SSID, the result of the
> >> printf would be usable in shell scripts to connect to the network.
> > SSID is not a string and it cannot be printed as such. It could even
> > include things like '\0' in it. If you want to get the raw SSID binary
> > data, you can get it from the beginning of the "ie" line in the BSS
> > ctrl_iface command output as a hexdump (starting with two octet header
> > if IE id/len).
> This would be quite cumbersome and it means that the ssid=... part of
> the bss output cannot be used used as the ssid=... part of a config
> file. It would be convenient if the SSID reported by scan_results can
> be copied to the config file in many cases. I don't really care about
> SSID's containing '\0': network maintainers that choose to have such
> SSID's deserve to face problems.
> >> The only problems I see with outputting the SSID as-is, is with '\n'
> >> and '\t'. Both mess up the output of `wpa_cli scan_results`. One way
> >> to solve this problem is to have ' ' match all three of them (spaces,
> >> newlines and tabs), another is by introducing escaping.
> > The proper way of handling the SSID is to copy the exact binary data
> > as-is rather try to pretend that it can be handled as text. As such, the
> > scan_results output is not suitable for this purpose.
> Using printf as in the experiment makes it possible to use
> extraordinary text values:
> printf "%q\n" "?"
> I believe it works for non-printable characters too, so outputting
> whatever octets make up the SSID (perhaps except for '\n', '\t', '\0')
> makes sense to me.
Except as Jouni says, those are valid bytes for an SSID. Perhaps the
bss output could be extended with an ssid_hex=... option that *could* be
fed right back into the ssid= part of the config.
But really, you're not going to get around the fact that SSIDs are not
strings, no matter what. That's the way it is, and applications have to
cope with it. You have no idea what encoding the browser was in when
the user typed in the SSID when configuring their AP, it could be
ShiftJIS or Chinese or UCS2 or something like that. There is no
guarantee that the SSID is printable ASCII. That's not to say the
supplicant couldn't help out a bit. Patches welcome, I'm sure.
More information about the Hostap