Format of options file

Ralph Corderoy ralph at inputplus.co.uk
Mon Mar 5 05:19:53 PST 2018


Hi Richard,

> > Let's forget Mac for the moment.  Linux text files are POSIX text
> > files; zero or more lines, each terminated by a LF.  See ascii(7).
> > DOS ones use CR followed by LF at the end of each line.
> > 
> > Thus a DOS text file looks like a text file to Linux, but one where
> > the last character at the end of each line, just before the LF, is a
> > CR, which is just another possible byte that could be within the
> > line.  If the operation you're doing doesn't mind that the CR is
> > present then it works;  reading the PID from the start of the line
> > and using it as a key.  But if you're using the data that runs to
> > the end of the line then you'll pick up the unwanted CR and that may
> > do things like corrupt the output.
>
> Essentially you seem to be saying I can't have a CR in a Linux text
> file because it is non-standard.

Nope, I'm not saying that.  `Linux text files are POSIX text files; zero
or more lines, each terminated by a LF.'  Here's a one-line POSIX text
file.

    $ sed -n l oneline
    one\rtwo\rthree$
    $ cat oneline
    three
    $

CR is just another character, a perfectly valid one, that's data as
opposed to the LF that's the line's terminator and meta-data.

> It seems a strange standard that prevents data interchange rather than
> facilitates it.

It is an excellent standard and ensures data interchange between the
hundreds of Unix text-processing tools that made Unix very popular.

> One article I read mentioned that errors can be caused by strict
> adherence to a standard so that the last line of data is ignored if it
> does not end with a newline.

Emacs is a non-Unix text editor that was ported to Unix, but kept its
foreign text-file format that uses line separators, not terminators.
Emacs users pollute the Unix filesystem with non-text files that have
trailing bytes beyond the zero-or-more Unix text lines unless they
change Emacs's defaults.  Unix programs are not at fault here, nor is
their `strict' adherence;  why bother otherwise?

> My approach would be to ignore the CR or to recognise three possible
> end of record markers, but it seems that is not allowed.

It certainly is allowed.  It would be the Richard Text File Format.
Programs could code to handle it.  More precise specification would be
required.  Is the first line-ending encountered from the start of the
file the one that must be used thence?  Or may each line differ?  RTTF
is incompatible with reading the POSIX `oneline' text file above so
conversion would be required, presumably by some means of escaping the
CR?  And escaping the escape character?  And converting files that were
already using the escape character to not mean escape.
https://xkcd.com/927/

> I can exchange data in elaborate formats like DOCX, ODT or H.264 but I
> can't as a text file.

They are file formats designed for an application, or a standard for
interchange.  A text file is an operating-system file format, and they
show their lineage.  CPM and DOS let the needs of the hardware leak into
the file format, wasting a byte each time.  Before them, Unix, really
Multics, bettered that by having a logical representation and leaving
the hardware's needs to the device driver.

I understand macOS, the latest name for it, has moved to POSIX's format.

> What is the answer?  How can Windows and Linux exchange data in text
> files without requiring the user to run them through external
> conversion programs?

A program can code for one format everywhere, e.g. POSIX, but that then
depends on editors that stick with POSIX format on non-POSIX systems.
They exist and are becoming ever more common.  Or it can do as Python as
done and try to adapt on the fly, but that corrupts some POSIX text
files on reading, which is why Python lets you disable it.

Because C has lines terminated by LF, like Unix, C libraries on Windows
convert CRLF to LF on reading a text file, and do the reverse on
writing.  Perl implements that.

-- 
Cheers, Ralph.
https://plus.google.com/+RalphCorderoy



More information about the get_iplayer mailing list