parser error

RS richard22j at zoho.com
Fri Nov 3 05:19:35 PDT 2017


From: Vangelis forthnet
Sent: Friday, November 3, 2017 1:41 AM


>On Fri Oct 27 22:16:03 BST 2017, Ralph Corderoy wrote:

>> If the BBC haven't already been informed
>> that a particular URL serves broken XML
>> then that's the first thing to change,
>> including pointing out the NUL bytes that are causing the problem.
>> I'm sure they'd like to work out what went wrong,
>> and stop it happening again.

> It looks as though the problem has been fixed upstream!
>After navigating to (geo-filtered):

>http://open.live.bbc.co.uk/mediaselector/5/select/version/2.0/mediaset/iptv-all/vpid/b09c79wx

>all three "connection href"s for service="captions"
>load and render perfectly now in Firefox,
>without generating an XML Parsing Error...
>Someone from the BBC staff does browse
>this list or was it perhaps an in-house find?

That’s interesting.  I don’t know whether anyone tried to view Suspicion in 
the iPlayer with subtitles to see if that was affected, but is seems more 
likely that the BBC would respond to something causing an error in the 
iPlayer.  The BBC does correct errors.  When we had problems with missing 
segment errors in HLS, many programmes were corrected a week or so after 
broadcast.

What is more interesting is that neither the file you refer to nor the 
captions file it links to
http://www.bbc.co.uk/iplayer/subtitles/ng/modav/bUnknown-5df25dc8-d38f-43e5-93a2-38b6c778f852_b09c79wx_1509625417009.xml
are XML files.

As I understand it, an XML file has to begin <xml>, have a link in its 
header to the DTD, and end <\xml>.

I may be slightly wrong about that.  The problem subtitles file began
<?xml version="1.0" encoding="utf-8"?>

The media selection file you refer to begins
-<mediaSelection>
where - is a dash character I can't copy.  Other <media> tags are preceded 
by a similar dash character.

The captions file begins
-<tt ttp:timeBase="media" xml:lang="en">
where again - is a dash character.

For both files Firefox displays a banner reading
This XML file does not appear to have any style information associated with 
it. The document tree is shown below.

I have been meaning to reply to Ralph and the others who commented.  I was 
going to do it here, but to avoid making this email any longer I'll do it in 
a separate email, except to draw attention to the Wikipedia article on 
TimedText_Markup_Language which you have already referred to
https://en.wikipedia.org/wiki/Timed_Text_Markup_Language
and in particular reference 2, WebVTT versus TTML: XML considered harmful 
for web captions?
http://www.balisage.net/Proceedings/vol10/html/Tai01/BalisageVol10-Tai01.html

Under the heading "Established industries versus emerging user communities" 
it says,

"While XML has been well received and is used in established industries, it 
has at least a disputable role on the web. The most prominent areas of 
debate are the draconian error handling implemented by XHMTL supporting web 
browsers and the growing suppression of XML through JSON as an interchange 
format for data on the web."

Best wishes
Richard





More information about the get_iplayer mailing list