Invalid XML Entities included in metadata file

Ian W Taylor ian.wight.taylor at gmail.com
Fri Mar 22 14:17:23 EDT 2013


When using get_iplayer (V2.82) I ask it to produce a "generic" XML 
metadata file.  But when I use xpath on it it says that the XML is invalid.

I think the problem is that get_iplayer uses HTML encode_entities() and 
there are about 250 entities defined in HTML but only 5 in the XML 
specification.  I've read that XML just defines " & ' < 
and > for the "&'<> characters.  However the generic metadata XML 
file includes things like £ for the British? Pound sign in the 
description nodes.

I fixed it by changing the call to encode_entities() in substitute() from...

        } elsif ($sanitize_mode == 3) {
             $replace = encode_entities( $value );

to be...

        } elsif ($sanitize_mode == 3) {
             $replace = encode_entities( $value, '"&\'<>' );

And that fixed my problem.  However I notice that are loads of other 
code (that it looks like I don't use) that have lines like "print XML 
... encode_entities( ..." and I suspect that they may need fixing too.

I don't know how to produce a "git patch", if that is the correct term, 
but I do know how to email so I am passing on my suggestion for a fix here.

Testing it ...

As of today there is a podcast on the BBC site that has a Pound sign in 
the description.  It is "Wake_Up_To_Money" episode 
"Money_Budget_day_20_Mar_13" and it can be obtained using --metadataonly.

The following xpath command barfs under Ubuntu.

xpath -e '//desc/text()' Wake_Up_To_Money/Money_Budget_day_20_Mar_13.xml

Under FreeBSD the xpath comand has the filename as the first param 
followed by the XPATH queries.  It also says that the XML produced by 
get_iplayer is invalid.  Mind you both versions of xpath are just perl 
scripts using XML::XPath.

Perhaps the problem is that a DTD used to be obtained from web site 
named in the 2nd line of the XML, but is no longer available ?
<program_meta_data xmlns="http://linuxcentre.net/xmlstuff/get_iplayer" 
revision="1">

-- 
Regards,
Ian Taylor




More information about the get_iplayer mailing list