parser error

RS richard22j at zoho.com
Wed Oct 25 16:51:15 PDT 2017


From: Vangelis forthnet
Sent: Wednesday, October 25, 2017 6:14 AM

>On Tue Oct 24 20:35:11 BST 2017, RS wrote:

>> The resultant .mp4 file can be played in VLC,
>> but MediaInfo shows no metadata.

Hello Vangelis

Many thanks for a very thorough explanation.

>If you ended up, for whatever reason,
>with an untagged file, you can always (re-)tag
>post download with the --tag-only switch:

>get_iplayer --type=video --pid=b00gmlrx --tag-only --tag-podcast-tv --tag-only-filename="path\to\Suspicion.mp4"

Thanks, that's useful.  Up to now I have assumed that tagging requires a 
massive data collection exercise, so I have downloaded afresh (which is not 
as great a hardship as it used to be now we have resuming for HLS and HVF) 
if something goes wrong.

>(I assume you renamed the "Suspicion.partial.mp4" to just "Suspicion.mp4")

No, I didn't think of that, so it is something else which is strange.  If 
downloading subtitles fails (other than because there are no subtitles) 
get_iplayer skips tagging by AtomicParsley and renames the partial file as 
though nothing is wrong.

I have received an email from someone who told me the Suspicion subtitles 
download fine on his XP installation with v3.01.  When I use v3.01 I still 
get the problem.  He also mentioned that there was something in the v3.01 
release notes about changes to subtitle handling.  I used --subsfmt=default 
and the subtitles downloaded without problem.  I can't do that in v3.05 
because --subsfmt has been removed.

> ...

>> Does anyone have any idea what causes a parser error?

>Answered by Colin; some further analysis below...

The corruption he refers to is a few spurious NUL characters in 
<head><metadata>.  The subtitles themselves are in <body> and they are 
intact.

>On Tue Oct 24 21:41:54 BST 2017, RS wrote:

>> I'm glad I asked because I hadn't realised
>> that was where subtitles came from.
> >I had assumed there was a ready-made .srt
> >file to download.

>On-line media portals (like iPlayer) rarely use the .srt
>(subrip text) format, because it's usually incompatible
>with their embedded player (Flash based/HTML5 one);
>I'm certainly not an expert on this subject, but Flash
>based players usually require an XML caption file
>(referred to also as DFXP), while HTML5 ones
>may use the WebVTT (.vtt) format.

>DFXP is s a timed-text format that was developed by W3C
>(stands for "Distribution Format Exchange Profile"); it is
>currently referred to as TTML, read more at:
>https://en.wikipedia.org/wiki/Timed_Text_Markup_Language

I didn't know any of this, or the paragraphs I have not quoted, so I am 
grateful for the explanation.   I was aware of --subsraw, but that does not 
solve the problem.  All the players I have used have accepted .srt files.

I suspect the answer is that XML::LibXML is less tolerant than whatever was 
used before (XML::Parser or XML::Simple?)

One good thing to come out of it is that I found a program (which you also 
mention and which I have not yet installed) called Subtitle Edit which will 
convert XML to SubRip.  It will also translate subtitles to other languages 
which could be useful with visitors whose mother tongue is not English.


>> I see from --info there are three subtitle modes.

>I used GiP 3.05 and the following command:

>perl get_iplayer-305w.pl --type=tv --pid=b00gmlrx -i --streaminfo > 
>Streams.txt 2>&1

>and yes, there are 3 captions modes identified,
>but, alas, I can sure tell there's a bug in the
>detection scheme somewhere; no sign of the
>legacy format, plus there's duplication, as

> ...

>but all three point to the same file!

You got the same as me, with 1 and 3 pointing to Limelight and 2 to Akamai. 
Subsequently it changed so there is now only subtitles1 (CDN: sis/10)

> ...


>With the aid of Fx's Page Source and
>a Text Editor, I managed to reconstitute
>a proper TTML file, then used SubtitleEdit
>to convert to (monochrome) .srt.
>If you're in need of it, contact me off-list...

Many thanks for the offer but I now have a file from which I can delete the 
spurious characters to create valid XML and feed to Subtitle Edit.  I also 
have a .srt file which I downloaded with v3.01 and --subsfmt=default.

Best wishes
Richard





More information about the get_iplayer mailing list