Programme Title: Subtitle, or Title: Episode Title?
Jeremy Nicoll - ml get_iplayer
jn.ml.gti.91 at wingsandbeaks.org.uk
Mon Nov 10 16:15:29 PST 2014
>From time to time I'm struck by entries in a cache file appearing to split
what I would think of as a programme title with a subtitle, into a shorter
title and an episode name...
For example if one looks at:
http://www.bbc.co.uk/tv/programmes/a-z/by/h/all?page=3
you can see what appear to be three programme groups named:
Hidden Histories
(BBC2, Welsh history)
Hidden Histories: Britain's Oldest Family Businesses
(BBC4, three programmes about family businesses)
Hidden Histories: WW1's Forgotten Photographs
(BBC4, a single programme I think)
When they were available some time ago I watched the Family Businesses
programmes; one of the lines in my download history says:
b03qlp97|Hidden Histories|Britain's Oldest Family Businesses: 2. Toye the
Medal Maker|tv|...
- that is, the programme name then was "Hidden Histories" and that episode
was "Britain's Oldest Family Businesses: 2. Toye the Medal Maker".
Similarly the WW1 photos programme in today's tv cache appears as:
|tv|Hidden Histories|b03xsrvv|Unknown|WW1's Forgotten Photographs|||...
which mean that as far as my own computer programs (processing these cache
and history files) are concerned these entries all seem to refer to the same
overall programme. And I suppose the Welsh history "Hidden Histories"
programmes would also look like the same thing.
A while ago I looked at get_iplayer's perl source code, but I'm not at all
fluent in perl. I had the impression though that maybe get_iplayer
concatenates various possible fragments of a programme's name, episode name
etc into one long string then tries to chop it up again. And if it assumes
that a string like "ABC DEF: GHI JKL" should be split on the first colon
(which is sensible IF that was "Programme: Episode") then it will make a
mistake if the string contains "Prog: Ramme: Episode"...
As DP has had to sweat blood, or juice (do pumpkins have blood?) on parsing
metadata recently, I wondered if any of the newer sources of metadata allow
better discrimination between programme & episode titles? If it's possible
to tell from the metadata sources that something is a Programme title, even
if it contains a colon, surely it shouldn't be split there?
And yet, my own code shows a few examples (seen over months, not necessarily
recent) where programme names do have embedded colons in them, eg:
"Doctor Finlay: The Further Adventures of a Black Bag"
"Hamish and Dougal: You'll Have Had Your Tea"
"Tim FitzHigham: The Gambler"
"Hinterland: Series 1 (full length)"
"MasterChef: The Professionals: Series 7"
"The Choir: Sing While You Work: Series 2"
"The Cruise: A Life at Sea"
"Vets: Gach Creutair Beo"
Why does it work some of the time and not others? Is it because get_iplayer
assembles descriptions etc from a bunch of different sources (web pages, RDF
pages ... whatever), or is it down to inconsistency in the way the BBC list
their programmes?
Or, is there also the BBC 'brand' to take into account? Maybe all of these
programmes are part of the same overall brand?
--
Jeremy Nicoll - my opinions are my own.
More information about the get_iplayer
mailing list