New radio PIDs, more than 8 characters

Tue Aug 15 04:14:12 PDT 2017

C E Macfarlane wrote:
> Thinking about this a bit more, I wouldn't wish to claim a spurious hit was
> more likely with no upper limit, but nevertheless I would still regard it
> better programming practice to have one  -  with normal written English, the
> potential for spurious hits would be low, and in the event of one it would
> be delimited quickly by the next space, but if you were trawling raw HTML or
> similar code, which might contain longs strings of pseudo-random characters
> as not just PIDs, but also GUIDs, session keys, and the like, then the
> potential for spurious hit would be very much increased, so more would be
> found, and in the interests of program efficiency you'd want them to be
> delimited sooner rather than later.

This is reasonable.  The regexp without an upper limit sourced from the 
BBC's code is used to confirm that a given string is formed only of 
characters from an acceptable set to make up a PID.  In most cases the 
string which is passed in is explicitly extracted from the request URL, 
as the application in question is a server-side, web-based one.  For 
such purposes I think the lack of an upper limit is completely 
acceptable, but if you're writing code to extract a valid PID from text 
of unknown length or complexity, the regexp probably is not very efficient.
-- 
James Scholes
http://twitter.com/JamesScholes