sld hates software: Linefeeds

From: sabrina downard
Date: 23:14 on 19 Jun 2005
Subject: Linefeeds

How many years have the various operating systems been doing their own 
thing with regard to what marks the end of a line?  Can we agree on 
something yet, please, for the love of bog?  I really don't feel that 
I'm asking for too much, here, it not being 1985 anymore and all.

hatefully,
--s.

p.s. Yes, iTunes -- exporting files that vi/awk/et al. on the same 
bedamned machine sees as two extremely long lines, and that only because 
one of the MP3s contained a comment which had a linefeed in it -- I'm 
looking at you.

From: Robert G. Werner
Date: 04:00 on 20 Jun 2005
Subject: Re: Linefeeds

sabrina downard wrote:
> How many years have the various operating systems been doing their own
> thing with regard to what marks the end of a line?  Can we agree on
> something yet, please, for the love of bog?  I really don't feel that
> I'm asking for too much, here, it not being 1985 anymore and all.
> 
> hatefully,
> --s.
> 
> p.s. Yes, iTunes -- exporting files that vi/awk/et al. on the same
> bedamned machine sees as two extremely long lines, and that only because
> one of the MP3s contained a comment which had a linefeed in it -- I'm
> looking at you.
> 
line\n\r
feeds\m
are^m
hard&return;

From: Dave Vandervies
Date: 16:17 on 20 Jun 2005
Subject: Re: Linefeeds

Somebody claiming to be Robert G. Werner wrote:
> 
> sabrina downard wrote:

> > p.s. Yes, iTunes -- exporting files that vi/awk/et al. on the same
> > bedamned machine sees as two extremely long lines, and that only because
> > one of the MP3s contained a comment which had a linefeed in it -- I'm
> > looking at you.
> > 
> line\n\r
> feeds\m
> are^m
> hard&return;

out=fopen("foo","w");
fputs("Nope, line\n",out);
fputs("feeds are\n",out);
fputs("actually\n",out);
fputs("really easy\n",out);
fclose(out);

Any system that runs general-purpose programs has a C I/O library that
knows exactly how to do line feeds for that system, and most non-C
languages either have C at the back-end anyways or can easily be
coerced to use the C library for I/O.

The hard part is finding a cluestick big enough for all of the people
who think all the world's a unix system and bypass the C stdio library
"for efficiency" because "It doesn't really matter, text and binary are
the same".  (Yeah, and the bugs and idiocies you're introducing are
really worth the few nanoseconds you save on millisecond-timed I/O
operations.)

The OP indicates that apparently not even all unix systems are unix in
this respect anymore...

dave

From: Michael G Schwern
Date: 02:21 on 22 Jun 2005
Subject: Re: Linefeeds

On Mon, Jun 20, 2005 at 11:17:26AM -0400, Dave Vandervies wrote:
> The OP indicates that apparently not even all unix systems are unix in
> this respect anymore...

OS X is a bit schitzoid in that the Aqua things use the Mac newline (carriage
return, 015) and the Unixy things use the Unix newline (012).  They generally
do a decent job interacting with each other but vim and XEmacs need to get 
a clue.  vim at least tries to take a stab at auto-detecting what newline
style is being used but it thinks a file with mac newlines is a DOS formatted 
file (015 012).  They couldn't go that extra inch to see if the 015 is followed
by a 012.  XEmacs doesn't even try at all... at least I haven't found the 
variable to flip to make it try.

Oddly enough I haven't had an issue lately with transfering text around
between different machines.  I guess after 10 years of fairly ubiquitous
Interneting things are finally learning how to play nice with others.

From: Robert G. Werner
Date: 02:42 on 22 Jun 2005
Subject: Re: Linefeeds

Dave Vandervies wrote:
> Somebody claiming to be Robert G. Werner wrote:
> 
{snip]
> 
> The OP indicates that apparently not even all unix systems are unix in
> this respect anymore...
> 
> 
> dave
> 
I guess I was thinking more along the famous line by Barbie about math ...

From: Peter da Silva
Date: 04:07 on 22 Jun 2005
Subject: Re: Linefeeds

> The OP indicates that apparently not even all unix systems are unix in
> this respect anymore...

Old-sk00l Carbon apps think they're running under Mac OS. The library 
even maps "Mac HD : Users : Peter" into "/users/peter" so they don't 
even see UNIX file names.

From: David Champion
Date: 20:06 on 25 Jun 2005
Subject: Re: Linefeeds

* On 2005.06.20, in <200506201517.IAA23114@xxxxxx.xxx>,
*	"Dave Vandervies" <dj3vande@xxxxxx.xxx> wrote:
> > > 
> > line\n\r
> > feeds\m
> > are^m
> > hard&return;

True.

> out=fopen("foo","w");
> fputs("Nope, line\n",out);
> fputs("feeds are\n",out);
> fputs("actually\n",out);
> fputs("really easy\n",out);
> fclose(out);

True.

The problem isn't in I/O, it's in protocol.  Everyone has a new and
improved way of indicating logical line breaks within their own
cross-platform specification.

Traditionally, the Intarweb uses MS-DOS line breaks, \r\n, for maximum
naive portability, while some specific platforms use either \r or \n
solo.  Each endpoint needs to be able to recognize what it's receiving
and match what it's sending.

I'm on a development team for an application -- a network listener
with a bunch of arbitrary purpose behind it -- where, mysteriously,
for reasons undiscovered, someone got \r\n backwards.  It issues line
breaks as \n\r.  This is fine if you're a raw terminal device, and it
doesn't really matter, but if you're a client application, this might
matter.  And, in fact, the client I use most often doesn't recognize
\n\r as a line break; it recognizes it as two shizophrenic line breaks,
so I get everything in doublespace.  This has caused me some amount of
teeth-grinding.  I've had to turn vegetarian.

> Any system that runs general-purpose programs has a C I/O library that
> knows exactly how to do line feeds for that system, and most non-C
> languages either have C at the back-end anyways or can easily be
> coerced to use the C library for I/O.

So, the trouble is it's not the host system, it's the interchange.
What about data representations where a logical newline is zero-width
whitespace, used exclusively to prettify presentation of metadata?  The
C library doesn't have a special XML mode, or a special LDAP mode, or a
special Joe's L33t RDBMS mode -- nor should it.  At some point you just
have to accept that your application needs to have a brain, and also
to use it.  Personally -- and I'll admit that I'm speaking as a UNIX
developer here -- I wish C didn't differentiate text and binary, not
because they're the same, but because there's more than just text and
binary in that big bad world, and it's not the C library's job to know
the difference.  It's just an illusion to think this alone is going to
save your ass.

> The hard part is finding a cluestick big enough for all of the people
> who think all the world's a unix system and bypass the C stdio library

Yeah, that's the Mac, right there.  (Not really.)

> The OP indicates that apparently not even all unix systems are unix in
> this respect anymore...

Who said anything about UNIX systems?  Maybe it's iTunes for Windows,
with Cygwin providing her %EDITOR% of choice.

Newlines are hard, and it's not UNIX's fault.

From: peter (Peter da Silva)
Date: 01:39 on 26 Jun 2005
Subject: Re: Linefeeds

> Newlines are hard, and it's not UNIX's fault.

In fact, UNIX followed the recommendations of the original ASCII standard
that if a single character was used for a line separator it should be
linefeed. Just about everyone else picked carriage return or both. Except
DEC, of course. RSX text files had a one or two byte length, followed by
an optional two byte line number, and the records themselves were either
padded to a multiple of 80 bytes or jammed together with no separator, and
blocks were either padded with nulls or lines could span block boundaries.

You have to look at the file type and mode to see which was which.

To read or write text files from Forth I gave up and called the Fortran
runtime.

From: Martin Ebourne
Date: 23:49 on 26 Jun 2005
Subject: Re: Linefeeds

On Sat, 2005-06-25 at 14:06 -0500, David Champion wrote:
> I'm on a development team for an application -- a network listener
> with a bunch of arbitrary purpose behind it -- where, mysteriously,
> for reasons undiscovered, someone got \r\n backwards.  It issues line
> breaks as \n\r.

Probably not the reason here, but that's the Acorn line break. There's a
good reason for it being that way round: it saved several bytes and a
quite a few machine cycles on the old BBC micro.

Cheers,

Martin.

From: Jarkko Hietaniemi
Date: 06:57 on 27 Jun 2005
Subject: Re: Linefeeds

Martin Ebourne wrote:
> On Sat, 2005-06-25 at 14:06 -0500, David Champion wrote:
> 
>>I'm on a development team for an application -- a network listener
>>with a bunch of arbitrary purpose behind it -- where, mysteriously,
>>for reasons undiscovered, someone got \r\n backwards.  It issues line
>>breaks as \n\r.
> 
> 
> Probably not the reason here, but that's the Acorn line break. There's a
> good reason for it being that way round: it saved several bytes and a
> quite a few machine cycles on the old BBC micro.

Ummm, could you elaborate.  ASM fine in the explanation.

> Cheers,
> 
> Martin.
> 
>

From: Martin Ebourne
Date: 11:04 on 27 Jun 2005
Subject: Re: Linefeeds

Jarkko Hietaniemi <jhietaniemi@xxxxx.xxx> wrote:
> Martin Ebourne wrote:
>> Probably not the reason here, but that's the Acorn line break. There's a
>> good reason for it being that way round: it saved several bytes and a
>> quite a few machine cycles on the old BBC micro.
>
> Ummm, could you elaborate.  ASM fine in the explanation.

Well I'm a bit out of practice on the 6502. But something like this.

There are two main OS calls for writing a character to the screen:

OSWRCH - OS write character, the underlying call to write characters
OSASCII - Same as OSWRCH but with character translation. In fact, all 
it does is translate the 'official' Acorn line ending to the underlying 
characters (which become the 'alternative' Acorn line break - should 
have been a bit more clear above), so it converts \r to \n\r. A key 
feature is that it returns with the accumulator untouched.

So in the OS syscall entry space we have:

.osascii    CMP #13
            BNE oswrch
            LDA #10
            JSR oswrch
            LDA #13
.oswrch     JMP (<jump table vector address>)

Clearly to write \r\n and still return with \r in the accumulator takes 
another JSR, LDA, and RTS. 6 more bytes and plenty of cycles.

Of course, if Acorn had used \n for linebreaks instead of \r then the 
code above would trivially produce \r\n and everything would have 
matched up with both unix & dos so much better.

Cheers,

Martin.

From: Peter da Silva
Date: 12:01 on 27 Jun 2005
Subject: Re: Linefeeds

On Jun 27, 2005, at 5:04 AM, Martin Ebourne wrote:
> Of course, if Acorn had used \n for linebreaks instead of \r then the 
> code above would trivially produce \r\n and everything would have 
> matched up with both unix & dos so much better.

Not to mention actually following ASCII which specified two possible 
encodings for a new line, either "linefeed-CarriageReturn" or 
"newline", where "linefeed" and "newline" were alternate names for the 
same position (0x0A, 0/10). Using "<CR>" for a newline breaks the 
straightforward translation of FORTRAN carriage control:

If the first character is space, replace with linefeed.
If the first character is plus, delete it.
If the first character is 0, replace with linefeed-linefeed.
If the first character is 1, replace with formfeed.
Print the line followed by a carriage return.

This produces the correct result in either case. Using carriage return 
for newline breaks FORTRAN, and in 1963 that was a big no-no.

From: Michael G Schwern
Date: 23:25 on 27 Jun 2005
Subject: Re: Linefeeds

On Mon, Jun 27, 2005 at 11:04:51AM +0100, Martin Ebourne wrote:
> Of course, if Acorn had used \n for linebreaks instead of \r then the 
> code above would trivially produce \r\n and everything would have 
> matched up with both unix & dos so much better.

\r\n makes sense to me as a newline, historically.  Its a direct translation
of the commands to the line printer.  Move the head to the first column.  Move
down one row.  \n\r makes sense in the same way.  \n I can understand for
Unix as by the early 70s working on displays rather than line printers is
more common and its no longer necessary to give explicit commands.  Though
why they changed it... maybe they just wanted to save one character per line?

But why use \r?  \n I get, "move down one line" and moving back to the
first column is implicit.  But \r... "move back to the first column" and
going to the next line is implicit?  Doesn't seem right.

Unless, of course, they didn't consider 015 to be "carriage return" and 012
to be "newline"?

Generated at 12:27 on 27 Sep 2007 by mariachi