Invalid XML Entities included in metadata file

Dave Lambley dave at lambley.me.uk
Fri Mar 22 16:16:44 EDT 2013


On 22/03/13 18:36, Roger Bell_West wrote:
> On Fri, Mar 22, 2013 at 06:17:23PM +0000, Ian W Taylor wrote:
>> I think the problem is that get_iplayer uses HTML encode_entities()
>> and there are about 250 entities defined in HTML but only 5 in the
>> XML specification.  I've read that XML just defines " &
>> ' < and > for the "&'<> characters.  However the generic
>> metadata XML file includes things like £ for the British? Pound
>> sign in the description nodes.
>
> I've had this problem too, and would like not to have to sanitise the
> XML before reading it.
>
>> Perhaps the problem is that a DTD used to be obtained from web site
>> named in the 2nd line of the XML, but is no longer available ?
>> <program_meta_data
>> xmlns="http://linuxcentre.net/xmlstuff/get_iplayer" revision="1">
>
> No, nothing specifically has to be at that URL; it's just a unique
> identifier for the namespace.

Try the attached patch, which switches to numeric entity encoding for XML.

Regards,
Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: numeric.patch
Type: text/x-patch
Size: 4676 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/get_iplayer/attachments/20130322/cac367b1/attachment.bin>


More information about the get_iplayer mailing list