Possible solution to the Windows AtomicParsley crashes?
David Woodhouse
dwmw2 at infradead.org
Sat Oct 8 14:25:55 EDT 2011
On Sat, 2011-10-08 at 18:37 +0100, dinkypumpkin wrote:
> On 08/10/2011 16:20, David Woodhouse wrote:
> > If you *decode* UTF-8, that implies turning it into actual letters...
>
> A different "decoding" in this case. What I mean here is that
> HTML::Entities::decode_entities() returns a "decoded" string with the
> utf8 flag set due to the presence of the expanded curly quotes. The
> tagging falls over because I didn't check the utf8 flags before using
> the metadata. An unfortunate mistake, I know.
Ah, I see — decoding the ’ entity?
I don't quite understand "utf8 flag" though. It's not a boolean option
"utf8" vs. "not-utf8". Again, there has to be *some* encoding.
Is it ASCII where 0x61 means 'a', or EBCDIC where 0x81 means 'a'? Or
ISO8859-1, or Windows-1252, or something else?
It's not *just* "not utf8".
> > That seems wrong. Why do we have a file handle that *isn't* expecting
> > normal UTF-8 text written to it? It can't be just "non-Unicode". It has
>
> What I mean here is that the handle for the download history file is
> opened without a Unicode-capable I/O output layer on it, so a warning is
> generated when the string with expanded curly quotes is written to a
> history record. As to why, I can't say. This probably just never came
> up when get_iplayer was developed.
Right, so we just have to make sure perl joins us in the 21st century
and allows us to write utf-8 to text files?
And file a bug against the perl implementations that don't do this by
default, perhaps?
--
dwmw2
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5818 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/get_iplayer/attachments/20111008/11ed5cab/attachment.bin>
More information about the get_iplayer
mailing list