Proof-of-concept scraper for iPlayer web frontend TV data to JSON

Rob Dixon rob.dixon at gmx.com
Fri Oct 31 18:09:09 PDT 2014


On 01/11/2014 00:27, dinkypumpkin wrote:
>
> I tried this same approach, but it foundered on radio programmes.  There
> is just too much stuff there.  It's soul-crushingly slow to scrape the
> iPlayer Radio site, at least for a desktop cache.  It would be great to
> have everything available on iPlayer searchable off-site, but there is
> too much of it for get_iplayer's current local caching model.  I'm going
> to have another go at some point.

There is no real need to download *all* of the schedule information;
after all, only a fraction of it will ever be of any use to an
individual user.

I would use the BBC server to do the search for me, after which there is
little work to be done. For instance, if I look for all Book at Bedtime
episodes with this URL

     http://www.bbc.co.uk/radio/programmes/a-z/by/book%20at%20bedtime/player

then I am taken a page with a link to the series at

     http://www.bbc.co.uk/programmes/b006qtlx/episodes/player?page=1

through to `page=6`. That amounts to 52 programmes which, even on my
meagre 13 megabit connection that takes less than ten seconds, and the
results could be cached for practically instantaneous response for a
similar request in the future. There is also the possibility of writing
a batch solution that makes a query only every minute or so and could be
run continuously or overnight.

I'm more than happy to write a proof of concept if you're interested. I
have it half-written already just to get that timing information.

The one thing that bothers me is the terms and conditions of the web
site. I scanned through them quickly and couldn't find anything about
robotic access, but it would be a first if there isn't anything there.
If it's just a matter of obeying the /robots.txt then I'm more than
happy to go ahead.

Let me know how I can help.

Rob



---
This email is free from viruses and malware because avast! Antivirus protection is active.
http://www.avast.com




More information about the get_iplayer mailing list