parser error

Jeremy Nicoll - ml gip jn.ml.gti.91 at wingsandbeaks.org.uk
Wed Oct 25 17:27:05 PDT 2017


On 2017-10-26 00:51, RS wrote:

> The corruption he refers to is a few spurious NUL characters in
> <head><metadata>.  The subtitles themselves are in <body> and they are
> intact.

But you're a human looking at the file.  XML files have a tightly 
defined
syntax (defined by a formal grammar called a DTD).  When a program tries
to extract data from an XML file it does so using standard code that 
knows
what the structure of the file is because it has also read the DTD.

Anyway for a program to be able to parse an XML file the parser reads
the file character by character and at every point it knows (from the
grammar definition) exactly what could come next and can classify it
as required.

By definition an XML file is only an XML file if it entirely matches
the grammar that is defined.  As soon as a parser finds a character
that makes no sense, the whole file is classed as corrupt, not an XML
file after all.

Much much more at: https://en.wikipedia.org/wiki/XML


-- 
Jeremy Nicoll - my opinions are my own



More information about the get_iplayer mailing list