Thu Aug 29 10:24:58 EDT 2013

Don Grunbaum <don at grunbaum.co.uk> wrote:

> The BBC certainly aren't infallible. The recent 2 part version of The 39
> Steps has the first episode labelled The Thirty Nine Steps and the second
> labelled The Thirty-Nine Steps. Took me a little while to spot why "record
> series" hadn't picked up the second episode.
>
> This seems to happen a lot with radio - i.e. the first episode is labelled
> differently to the remaining episodes.

I don't use the PVR part of get_iplayer, but even using the get_iplayer
commands directly to search the cache and/or fetch something, I'm beginning
to adopt 'patterns' rather than straightforward text substrings for names of
programmes, because of this problem.  So for example

  get_iplayer --type=radio --get "Thirty.*Nine"

If you look at long help (ie call get_iplayer with --help-long specified)
you'll notice that it says that a search parameter can be a 'REGEX'. 

REGEXes are 'regular expresssions' and are basically definitions of patterns
of characters.   (That name makes no sense in day-to-day spoken English but
is an academic term which means that the contents of such a pattern (ie an
expression) obey formal rules on how it is put together.  These formal rules
are known in computing as a 'grammar' and the particular type of grammar
that these obey is a 'regular' one.  This makes it possible to program a
computer to understand unambiguously what such a pattern means.)

In the simplest case a simple fragment of text is a simple regex; if you
specify eg

   "Thirty Nine"

that will match anything in the cache which contains the string "Thirty
Nine"; people specifying values like this don't know they are using regexes.

But you could also specify for example:

   "Thirty.*Nine"

and that will match anything in the cache which contains "Thirty" and "Nine"
separated by an arbitrary number of arbitrary characters.  So this will work
whether a programme is called "ThirtyNine" or "Thirty    Nine"  or
"Thirty-Nine" or even "Thirty hungry people ate Nine cakes"... so it's still
not perfect!

In another example: recently there was a TV series of three programmes all
named Dreaming The Impossible... and I wanted to fetch two of the three
episodes. I'd watched one of them live.  If I just said to get_iplayer to
get "Dreaming the" it would have fetched all three.  So I used a pattern:

 (dreaming the).*(connections|revolution)

which meant: match any entries in the cache which contained

  (dreaming the)             - ie "dreaming the" as a single entity
  .*                         - followed by arbitrary text
  (connections|revolution)   - followed by a single entity which is 
                               either "connections" or "revolution"

The strings "connections" and "revolution" were in the episode names, IIRC,
which get_iplayer looks at if (as I do) you also use --long in a search.

The brackets around "dreaming the" aren't absolutely essential but possibly
make the expression easier to read.  

The brackets around (connections|revolution) are essential because they tell
the part of get_iplayer which reads these expressions that the stuff inside
the brackets is one section of the pattern, even though it contains a pair
of options.

Using regexes with their funny layout and meaning needs practice.  If you
google for "perl regex" or "pcre" (which means perl compatible regular
expressions, the 'flavour' of regex that perl understands) you'll find lots
of help on how to build expressions.  

-- 
Jeremy Nicoll - my opinions are my own.