parser error

Jeremy Nicoll - ml gip jn.ml.gti.91 at wingsandbeaks.org.uk
Sat Oct 28 04:02:36 PDT 2017


On 2017-10-27 21:47, RS wrote:

> If you are both right about the strictness of the standard, and I have
> to defer to your superior knowledge, why does XML::LibXML have options
> for recovery and validation? According to
> http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#PARSER_OPTIONS
> and
> http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Error.pod
> it also has a choice of Verbose and Quiet error handlers.  Authors can
> use their own error handlers, or remove the error handler altogether.

The most obvous reason would be to use XML::libXML as a validator, 
before
releasing files you were then certain were properly formed.

I think 'recovery' is this sense merely means the parser returns an
error code; there's nothing to suggest that you can then go on and
make data-extraction calls against the XML file... you'll just keep
getting the error code.


> An example given is recovery from a missing closing tag.

- which is no use in this situation when the NUL occurs before any of 
the
data you're interested in.


> I have not
> seen a definition of fatal error.  Is a spurious NUL a fatal error?

I think so, according to that original wikipedia article, because it
said that a NUL is one of the only characters that can never be valid
in an XMl document.


> I suspect it is less serious than a missing closing tag.

Not if the parser knowing it can NEVER be valid stops right there.


> It is easy to recover from; you just ignore it.

There's no reason to ignore it.  By definition, finding one means
that you do not have a valid XML file.


> Subject to what anyone may tell me,
> I would have thought non-matching tags would be more likely to be a
> fatal error.

Well, HTML - which has looser parsing criteria - does manage that sort
of thing.  But HTML is not XML.


> It must be remembered that an important function of XML, in contrast
> to other mark up languages, is that it is human readable as well as
> machine readable.

OTOH the designers of XML clearly felt that well-formedness was just
as important.


> Error recovery must always be appropriate for the importance of
> integrity of the data and the probability of errors.  I can understand
> there are applications where strict compliance is necessary, but
> subtitles does not seem to me to be one of them.

Then take that up with the BBC and tell them that their choice of XML
for these files is inappropriate.


> Subtitles for this film used to work with XML::Simple.  A problem only
> occurred with the move to XML::LibXML to support coloured subtitles.

Surely the problem is that this specific XML file is corrupted?

Are you finding that every single XML file is corrupt?

-- 
Jeremy Nicoll - my opinions are my own



More information about the get_iplayer mailing list