New radio PIDs, more than 8 characters - "solved"

C E Macfarlane c.e.macfarlane at macfh.co.uk
Mon Aug 14 07:47:47 PDT 2017


More about regular expressions and programming follows, which those with no or little interest in either can safely ignore ...
-- 
www.macfh.co.uk/MacFH.html

>     > I think what Charles was meaning is that if you were using 
>     --url "http://www.bbc.co.uk/programmes/b08xy0gl" rather than 
>     a direct PID then the code is looking for something starting 
>     with either b, p or w followed by between 7 and 14 letters or 
>     numbers and the first thing it hits that matches all that 
>     criteria is the word "programmes". Like you say, GiP wouldn't 
>     return any VPID info but as it finds programmes to be a valid 
>     PID, it won't keep looking for the proper PID in that URL so 
>     would never be able to download from a URL.

Yes, it depends very much on the intended use for the regular expresion (RE).

The most general situation is trawling any text, such as HTML, WITHOUT REGARD TO CONTEXT for capturing something resembling a PID.  In this situation, probably even the correction I suggested may not be adequate, it might be necessary to bracket it at the beginning and end with non-capturing non-word meta- or pseudo-characters, the representation of which can sometimes differ from language to language but is usually \W, as it is in PERL, so ...
	\W([bpw][0-9][a-z0-9]{7,13})\W
... should capture PIDs reasonably accurately without regard to context, though I wouldn't rely even on this without a deal of testing with many actual examples of text to be trawled.

However, if you already know something about the context, then of course that makes things easier.  The correction I suggested should pick PIDs out of URLs more elegantly and simply, in a single statement in fact, than either the original suggestion or programming to implement the following pseudo-code ...

>     ?
>     pseudo code
>     if --url
>       strip characters following last /
>       use as pid
>       validate_pid
>     end-if
>     ?

... particularly as URLs exist with other characters after the PID, though perhaps these might not be used in the context of GiP.
     
>     Anyway...
>     changing all 7 occurrences

:-(

>	(sigh...)

I think in my case that would more likely have been '(expletive deleted)'!

>	of
>     [bp]0[a-z0-9]{6}
>     to
>     (?:[bp]0[a-z0-9]{6}|w[a-z0-9]{7,14})
>     solves the w3*, w1* problem for Me.

> Also. No disrespect intended to Dinkypumpkin as "he's" only picked-up
> existing code but, as an ex-programmer I'm horrified by the code
> repetition.  Doesn't Perl allow 'functions'?  i.e. if valid_pid ...
> where valid_pid contains said validation.

Yes, grateful though I am, probably along with all of us here, for GiP's wonderfully useful functionality, when I first looked at its code, I rejected any idea of contributing much actual programming suggestions, because I'd feel I had to completely rewrite the program rather than just tinker with it!

I can't remember where now, whether it was from a book, or a 6th form college or university course, but somewhere somehow I acquired a mental list of very basic things to get right when programming ...

Pretty much the first item in that list was to declare constants at the beginning of the program containing all the fixed or semi-fixed values that the program needed, so that if one of them changed, you only had to change the one easily-found line at the beginning where the value was declared, not the possibly tens, hundreds, even thousands of lines throughout the rest of the program where that value was used.  A template for BBC URLs and an RE to capture PIDs would both obviously be prime examples of this.

As you suggest, another, probably second or third on the list, was to put oft repeated code in subroutines/functions.

When I got out into the 'real' world, I was appalled to find that code that disregarded most or all of the principles outlined in my mental list was actually widespread, perhaps even in the majority!  I sometimes think it's a near miracle that some programmes ever run correctly at all!

Regards.




More information about the get_iplayer mailing list