Proof-of-concept scraper for iPlayer web frontend TV data to JSON
Steven Maude
get_iplayer at stevenmaude.co.uk
Thu Oct 30 17:08:15 PDT 2014
Since I was thinking about scraping iPlayer yesterday, I spent an hour
or two this evening and hacked together this Python script which pulls
programme info from the iPlayer TV category index pages and (for now)
outputs the data as JSON:
https://github.com/StevenMaude/nitroradical
There's three ways something like this could be used:
1. Client-side scraping of programme data (maybe a Perl script that more
directly hooks into get_iplayer would be better?)
It would take some time to populate the programme data. Scraping the
index pages for TV actually doesn't take that long, but in some cases
you'd have to pull out individual programme pages to get all the episode
info for them. As is, my script just gets the most recent episode.
2. Server-side scraping of programme data so that users could scrape on
a server and set up a feed users can access.
The advantage of this is that it would be much quicker for users as you
could access the processed feed in a single HTTP request (rather than
hitting the BBC site numerous times).
However, dinkypumpkin mentioned that centralising a feed wasn't a
preferred option. That said, there's nothing to stop having a
user-specified option to point get_player to a specific feed URL. If
someone hosts a feed, then decides to takes it down, someone else could
take over.
Both of those would need get_iplayer to be modified.
3. It would be possible to use the output of this scraper client-side to
search for programmes of interest, and then call get_iplayer with the
appropriate pid to download the programme if any are found. More work
would be needed for this, and it would be hacky, but could work too.
This wouldn't need get_iplayer to be modified; it would just uses the
existing pid download feature.
If there's interest, I'm happy to work on wrangling out get_iplayer
compatible feed data. (A guide to the structure of the iPlayer feeds
would be handy.)
More information about the get_iplayer
mailing list