Further Thoughts On The Data Issue

C E Macfarlane c.e.macfarlane at macfh.co.uk
Sat Nov 1 11:43:46 PDT 2014


> Steven Maude Wrote:
>
> There's three ways something like this could be used:
>
> 1. Client-side scraping of programme data (maybe a Perl script that more
> directly hooks into get_iplayer would be better?)
>
> It would take some time to populate the programme data. Scraping the
> index pages for TV actually doesn't take that long, but in some cases
> you'd have to pull out individual programme pages to get all the episode
> info for them. As is, my script just gets the most recent episode.

I have also thought about this.

For one thing, the problem is that this would be extremely prone to failure.
As we have seen, the BBC are constantly changing things, and this includes
their website (and, IMO, every time they change it, the result is less
ergonomically efficient, and therefore less usable, at least with old
browsers, than it was before  -  the current site is horrendous, it's
desperately slow, difficult to search, and it takes for ever to go
systematically through the programmes available, but I suppose in this
context that's only peripherally relevant).

But, apart from that, the BBC have rather shot themselves in the foot,
because if we all start implementing this choice, the additional traffic on
their site would certainly be most unwelcome and perhaps might even bring it
down.

However, there is a possible variant.  There are already existing programmes
that may be open to adaptation, for example Digiguide, http://digiguide.tv/.
Both the web-page based Digiguide.tv Premium  -  a mere £2.99/yr which I
would imagine everyone here would be happy to pay  -  and the programme
Digiguide For Windows  -  a rather more pricey $14.99/yr for the latest
version, although when I renewed yesterday using an old version, 8.3, it was
only £9.99  -  give much better and quicker screen displays than the BBC
iPlayer site, are far easier to use, and to varying extents can be
customised.  Already with D4W one can customise how alerts look and behave,
and can even script their behaviour using JavaScript, and add-ins have
already been created to set up recordings on TV Tuner cards, etc.

Ideally, one would prefer the relevant iPlayer information to be displayed
by the program without extra scripting, and accordingly I've just created
this post, Suggestions: http://forums.digiguide.tv/forum.asp?id=2.  However,
even failing this making any headway, rather than all users scraping the BBC
site for all programmes, it would make more sense for each user to only have
to scrape for those details relevant to that particular user's interests, as
defined by the alerts that (s)he has set within the program, and I'm now
just beginning to look into doing this via the alert process.  It seems to
me that it should be possible to get the Digiguide For Windows version to
launch GetIPlayer For Windows automatically a few minutes after the
programme to be recoreded ends, or, if it is already running, add the
programme's details to a queue list instead.

> 2. Server-side scraping of programme data so that users could scrape on
> a server and set up a feed users can access.

There would undoubtedly be copyright issues with this, as the metadata must
be owned by the BBC.  IMO, a better, though long-term, approach would be
tackle this from the head down.  I've just created a government petition
which, if accepted, (well, let's dream for a while) would require all
government and government- or publically-funded institutions to use Open
Source software and Open Data standards wherever reasonably possible,
specifically mentioning the BBC, C4 and other recipients of PSB funding.
I'll link to it if and when it gets put out to signatories.

If ever implemented, we could use that as a stick to beat the BBC into
reinstating the lists.

> The advantage of this is that it would be much quicker for users as you
> could access the processed feed in a single HTTP request (rather than
> hitting the BBC site numerous times).

Yes.

> However, dinkypumpkin mentioned that centralising a feed wasn't a
> preferred option. That said, there's nothing to stop having a
> user-specified option to point get_player to a specific feed URL. If
> someone hosts a feed, then decides to takes it down, someone else could
> take over.

Yes, and it's also a central point of failure, and easy for the BBC to 'go
after', either by technical means, such as IP blocking, or legally.  It
would be better to implement such a solution by Peer-To-Peer sharing, each
user only getting a fraction of the data, and sharing it with everyone else
to make up the whole.

> Both of those would need get_iplayer to be modified.

That's now neither here nor there as it's already been so!

> 3. It would be possible to use the output of this scraper client-side to
> search for programmes of interest, and then call get_iplayer with the
> appropriate pid to download the programme if any are found. More work
> would be needed for this, and it would be hacky, but could work too.
> This wouldn't need get_iplayer to be modified; it would just uses the
> existing pid download feature.
>
> If there's interest, I'm happy to work on wrangling out get_iplayer
> compatible feed data. (A guide to the structure of the iPlayer feeds
> would be handy.)

Please don't follow the current GIP standards of using a vertical bar as a
field seperator.  Use an open data standard such as, for my order of
preference, JSONP, JSON, XML, TSV, or CSV.  This would allow other solutions
or part-solutions to interface more easily.

Regards,
Charles Macfarlane

www.macfh.co.uk/CEMH.html




More information about the get_iplayer mailing list