Subtitles, Round 2

Vangelis forthnet northmedia1 at the.forthnet.gr
Wed Sep 25 03:09:35 EDT 2013


On Tue Sep 24 20:50:08 BST 2013, dinkypumpkin wrote:

>...I decided to... iron out some subtitles-related issues
>raised in the last couple of days.
>
>The patch against Git HEAD is here:
>https://github.com/dinkypumpkin/get_iplayer/compare/feature;subtitles.patch
>With this patch, subtitles should be formatted as they would be in the
>Flash player on the iPlayer site (incl. explicit line breaks)...
>
>It would be helpful if anyone interested in getting subtitles sorted
>would have a go at testing this.
>Please do it soon -
>I want to resolve this so I can get another release out.

Well, first of all I am only a very sparse TV downloader,
for reasons errr.... probably known to other list members;
I'd say 2-3 shows a month is descriptive of my habits - and
these are niche content I can't find elsewhere. So I am not
your ideal tester for this patch...
 Having said that, I do sometimes use GiP to fetch subtitles
for BBC content acquired through other means (if I were in
the US, I'd say "I plead the Fifth"...).
 I have applied your linked patch to my local copy of the
get_iplayer.pl script, then used the patched version to
re-download the subs to programme with pid=b03bjpcy
(see my previous mail here:
 http://lists.infradead.org/pipermail/get_iplayer/2013-September/005055.html 
 )

get_iplayer --pid=b03bjpcy --subtitles-only --force

The above yielded an .srt file sized 60.9 KB and was tested
together with its corresponding video file by using
MPC-BE v1.2.0.3.2938.
I have to declare that it is a VAST improvement over the status
quo ante, indeed in this test file the subs are identical to how
they are presented on-line on iPlayer - this also means that
I "got hit" by a very rare "three-liner" in subtitle no. 663:

663
00:52:54,360 --> 00:52:58,800
on macaques
that had been deliberately given
Parkinson's Disease.

but this is how the Beeb made it; not a fault of GiP's.
I fixed it manually to:

663
00:52:54,360 --> 00:52:58,800
on macaques that had been deliberately
given Parkinson's Disease.

It would be heaven-perfect if GiP made sure that every
subtitle does not exceed two lines, but I do realise this is
too much to ask...
So, a sterling patch it is then!

>The(re) may be other format variations lurking
>in iPlayer that I would never come across.
>
>If you check subtitles during playback and find big chunks of dialogue
>missing, first check the raw subtitles files before blaming get_iplayer.
>I have found several programmes with missing dialogue.

In my limited experience with subtitles over the past 2 years,
I would put the Beeb's subs under 2 categories:
1. The ones that at the early days of iPlayer were labeled as "prepared";
these are made by a third party dedicated service, are usually very
accurate and in-sync with audio and always end with
"Subtitles By Red Bee Media Ltd"
2. The ones that at the early days of iPlayer were labeled as "live";
these are machine-generated transcripts of the live (audio) TV feed.
More often than not, they are of low quality, usually off-sync with
the spoken audio and with missing dialogue - or may contain repeated
subtitle lines - in general are beyond an easy repair and I avoid them
altogether.
 Contrary to their "live" label, they are not limited to live shows - 
typical
examples are Friday's pre-recorded Jools Holland's show (when subs do
come with it) and most content on the BBC News Channel (namely "Click"
and other shows there that I am exposed to via BBC World News Channel).

Again, cheers dinkypumpkin for this latest fix!
V. 




More information about the get_iplayer mailing list