Proof-of-concept scraper for iPlayer web frontend TV data to JSON

Steven Maude get_iplayer at stevenmaude.co.uk
Thu Oct 30 17:08:15 PDT 2014


Since I was thinking about scraping iPlayer yesterday, I spent an hour 
or two this evening and hacked together this Python script which pulls 
programme info from the iPlayer TV category index pages and (for now) 
outputs the data as JSON:

https://github.com/StevenMaude/nitroradical

There's three ways something like this could be used:

1. Client-side scraping of programme data (maybe a Perl script that more 
directly hooks into get_iplayer would be better?)

It would take some time to populate the programme data. Scraping the 
index pages for TV actually doesn't take that long, but in some cases 
you'd have to pull out individual programme pages to get all the episode 
info for them. As is, my script just gets the most recent episode.

2. Server-side scraping of programme data so that users could scrape on 
a server and set up a feed users can access.

The advantage of this is that it would be much quicker for users as you 
could access the processed feed in a single HTTP request (rather than 
hitting the BBC site numerous times).

However, dinkypumpkin mentioned that centralising a feed wasn't a 
preferred option. That said, there's nothing to stop having a 
user-specified option to point get_player to a specific feed URL. If 
someone hosts a feed, then decides to takes it down, someone else could 
take over.

Both of those would need get_iplayer to be modified.

3. It would be possible to use the output of this scraper client-side to 
search for programmes of interest, and then call get_iplayer with the 
appropriate pid to download the programme if any are found. More work 
would be needed for this, and it would be hacky, but could work too. 
This wouldn't need get_iplayer to be modified; it would just uses the 
existing pid download feature.

If there's interest, I'm happy to work on wrangling out get_iplayer 
compatible feed data. (A guide to the structure of the iPlayer feeds 
would be handy.)



More information about the get_iplayer mailing list