Possible solution to the Windows AtomicParsley crashes?

David Woodhouse dwmw2 at infradead.org
Sat Oct 8 14:25:55 EDT 2011


On Sat, 2011-10-08 at 18:37 +0100, dinkypumpkin wrote:
> On 08/10/2011 16:20, David Woodhouse wrote:
> > If you *decode* UTF-8, that implies turning it into actual letters...
> 
> A different "decoding" in this case.  What I mean here is that 
> HTML::Entities::decode_entities() returns a "decoded" string with the 
> utf8 flag set due to the presence of the expanded curly quotes.  The 
> tagging falls over because I didn't check the utf8 flags before using 
> the metadata.  An unfortunate mistake, I know.

Ah, I see — decoding the ’ entity?

I don't quite understand "utf8 flag" though. It's not a boolean option
"utf8" vs. "not-utf8". Again, there has to be *some* encoding.

Is it ASCII where 0x61 means 'a', or EBCDIC where 0x81 means 'a'? Or
ISO8859-1, or Windows-1252, or something else?

It's not *just* "not utf8".
 
> > That seems wrong. Why do we have a file handle that *isn't* expecting
> > normal UTF-8 text written to it? It can't be just "non-Unicode". It has
> 
> What I mean here is that the handle for the download history file is 
> opened without a Unicode-capable I/O output layer on it, so a warning is 
> generated when the string with expanded curly quotes is written to a 
> history record.  As to why, I can't say.  This probably just never came 
> up when get_iplayer was developed.

Right, so we just have to make sure perl joins us in the 21st century
and allows us to write utf-8 to text files?

And file a bug against the perl implementations that don't do this by
default, perhaps?

-- 
dwmw2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/get_iplayer/attachments/20111008/11ed5cab/attachment.bin>


More information about the get_iplayer mailing list