get_iplayer repair update #1

Alan Foster alan.c.foster at gmail.com
Tue Nov 4 10:49:49 PST 2014


Thank you for your hard work on get_iplayer.

Alan.

On 1 November 2014 00:45, dinkypumpkin <dinkypumpkin at gmail.com> wrote:
> get_iplayer has been more or less repaired, but there are still some wounds.
> I'm going to release what I have on Sunday.  I'm on the road next week, so
> I've run out of time to do more for the time being. Consider it a stopgap
> until progress can be made on other fronts. This is where things are:
>
> 1. I've disabled code related to the discontinued feeds, so you shouldn't
> get any more bogus values in your metadata tags.  You should also see
> thumbnails again in files < 7 days old downloaded via PID.
>
> 2. The new release will support entry of multiple PIDs.
>
> 3. I've more or less restored the 7 day cache for TV and radio.  There are
> still some holes in it:
>
> a. It is not possible to search for audiodescribed versions of programmes.
> I haven't been able to source that information.  If anyone has any clues on
> the subject, chime in - but not if your suggestion is to scrape the iPlayer
> site.  That isn't on the table right just yet.
>
> You can still download audiodescribed versions, but you'll have to look for
> them on the iPlayer site.  Signed versions should still be flagged in the
> get_iplayer cache, but some may be missing.  Again, check the iPlayer site
> if in doubt.
>
> I've changed get_iplayer to always scrape the related episode page to look
> for audiodescribed/signed versions when requested, so hopefully more
> downloads will be successful.  I found a number of cases where the playlist
> data for recent programmes didn't contain identifiers for audiodescribed
> versions even though they existed on the iPlayer site.
>
> b. It is not possible to search radio programmes by category. TV programmes
> still have category information. There is a source for radio category
> information, but it uniformly foundered on Radio 4 and Radio 4 Extra, which
> is where the categories are most meaningful.  I know that is going to break
> some PVR searches, but the alternative is a support headache I can't absorb.
>
> c. I can't vouch that every programme from the previous 7 days will show up
> in the cache. As always, you can use the PID for any programme not in the
> cache. By the same token, I can't vouch that every programme in the cache
> will be downloadable.  The new feeds contain noticeably more programmes,
> some due to the inclusion of web-only stuff. With the heavier load, cache
> refreshes are noticeably slower than with the old feeds, ca. 90 seconds for
> me for tv+radio.
>
> 2. The more-or-less restored cache depends on some old data feeds lingering
> at the BBC.  Recent events have taught us that they could disappear without
> warning, so I've implemented a fallback mechanism. There will be a new
> option that will switch the cache to refresh from the channel schedule pages
> instead of the old data feeds.  However, this fallback is also limited:
>
> a. It is not possible to search for audiodescribed or signed versions of
> programmes.  That information isn't in the schedule pages.
>
> b. It is not possible to search TV or radio programmes by category. Again,
> that information isn't in the schedule pages.
>
> c. Cache refresh is slow, ca. 4+ minutes for a full TV and radio refresh for
> me.  The time could be cut by about 1/3 by removing regional TV channel
> variations, but it cuts out 50+ programmes, so I've left them in for the
> present.
>
> d. It appears that fewer programmes from the previous 7 days get cached
> compared to the feeds.  Part of that is because the schedule pages don't
> show most web-only programmes.  Part of it may also be because I'm checking
> availability info in the schedule pages more strictly than whatever produces
> the data feeds.  Again, you can use the PID for anything not in the cache.
>
> e. The only plus to using the schedule pages to populate the cache is that
> it becomes possible to expand your cache out to 30 days.  It seems to work
> OK, if you have 10-15 minutes to refresh your cache.  There will be an
> option for this.
>
> f. I've given you enough rope to hang yourself, but don't put this fallback
> option into regular use unless it becomes necessary - seriously.  It's only
> there to avoid weeks like this one.  I won't be interested in hearing how
> slow it is or how it doesn't locate some particular programme.  And for
> pete's sake *don't* use it with the Web PVR.  If you insist on playing
> around with it, you'll probably want to bump up --expiry to some gigantic
> number and refresh your cache manually as needed.
>
> 3. Looking further ahead
>
> Some things that have been floated here in the past few days:
>
> a. Programme data services: If somebody implements something along these
> lines, I'm sure get_iplayer could be integrated with it.  It's clear that
> get_iplayer would never be able to access Nitro if and when it's ever opened
> up.  But, if somebody can repackage Nitro data for wider use, that would be
> pretty useful.
>
> b. iPlayer site scraping: This could also be the foundation of a programme
> data service instead of Nitro.  It is also the only real hope for
> get_iplayer to regain a full-featured desktop cache, though I'm not sure it
> will be practical.  A full scrape is out of the question for local caching -
> there are just too many programmes on the radio side. However, even caching
> just the previous 7 days will be much much slower than with the old data
> feeds.  The number of requests and the amount of data to move over the wire
> and parse would be vastly greater. Some sort of parallelisation might help.
> The trick will be to figure out the right way to filter the listings down to
> a practical volume.
>
> I started down this road, but it was way too slow for radio and it was going
> to be too much work for the time available.  Plus, it didn't seem worth
> leaving get_iplayer crippled any longer than necessary.  To do this properly
> will likely mean adding some dependencies to get_iplayer as well as some
> major reworking.  I'm going to keep working in that direction just to see if
> it can be done, but no idea if it will be of practical use.
>
> Also see Steven Maude's recent post for his take on the problem.
>
> c. External search/indexing applications:  To my mind, it seems like a good
> idea for some energetic person to split this out.  get_iplayer badly needs
> to lose weight, not gain it, and there is a pretty clear functional
> separation between searching and downloading.  get_iplayer needs a lot of
> work in handling metadata that could make it a better downloader, so it
> would be no bad thing to get out of the caching business.  I'll have my pony
> now, thanks.
>
>
> _______________________________________________
> get_iplayer mailing list
> get_iplayer at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/get_iplayer



More information about the get_iplayer mailing list