You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I want to change line terminators, I can use enline and deline. There is no string conversion without changing the line terminators
Rebolbot commented on Mar 11, 2010:
Submitted by:BrianH
I'm inclined to say that this is not a bug.
REBOL strings use "^/" as a line terminator internally. When you convert to REBOL strings, you convert to REBOL internal line termination. All of the REBOL functions that deal with strings expect REBOL line termination. Other line termination standards are an external matter, handled by the conversion routines that are used to format the strings in binary: WRITE if you want something platform-specific, TO-BINARY if you don't. And use DELINE and ENLINE if you need to work around this (though see Oldes/Rebol-wishes#42).
If you want binary information conserved then work in binary; don't convert to string. This will save you from the invalid UTF character conversion as well.
This issue was also raised on the Atronix R3 repository. I wrote a blog entry that summarizes my opinions on why we should be thinking about living in a world without CR LF:
I think the big mistake here is trying to take a real/actual/concrete problem and make it "invisible"...thus losing data without warning.
You can't wish away complexity, but you can ask it to go away. I'd suggest that Rebol favor the universe that Unix/Posix/Linux (then OS/X, and now Windows seem to be going for) with just LF. Look at the move to line-feeds-only as a vote for the future... like using UTF-8 as an exchange medium.
So consider files or binaries with carriage returns in them to be a foreign format. Don't read them or write them without a special codec, the same way you'd need for UCS-2 or anything else.
Then have the decoder have options to preserve CR bytes, discard them, give errors if they are found standalone vs. paired with an LF, in reverse order, etc. All the lovely issues you have from the two-character sequence.
It might seem tempting to just say that if you manage to get a string into the system with CR in it that you should write it out. But I'd say the UTF8 default encoder used and standardized by the system should be picky too. Given how much of Rebol's common assumption (and the assumption we'd like to be able to make systemically) is that newline is all you need, if you didn't filter your newlines out you will be getting a mixture most of the time.
So...
Make a strong decision about the default: LF is favored by everyone these days but Notepad, and it's better to help facilitate living in that world.
Standardize that when Rebol files are exchanged over the network they will not have CRLF in them. Don't load source unless a special command line switch or mode is set...default is OFF. (I feel the same way about tabs.) No matter what tolerance is given by these modes do not let string literals have the "bad" characters in them.
If someone is working in a hybrid environment where their data files do have CR in them, be noisy. Don't read as strings or write back out with CR unless they really know what they are doing and demand it. Make it as easy as feasible to demand and give guidance...but make it clear that the native tongue is no-CR.
Other characters that should be excluded would be things like the BOM (Byte-Order-Mark), which is basically a bug if it appears in UTF-8 data, most of the time.
Submitted by: sqlab
Imported from: CureCode [ Version: alpha 97 Type: Bug Platform: All Category: Datatype Reproduce: Always Fixed-in:none ]
Imported from: metaeducation#1517
Comments:
Submitted by: meijeru
This is a feature, as far as I know.
Submitted by: sqlab
If I want to change line terminators, I can use enline and deline. There is no string conversion without changing the line terminators
Submitted by: BrianH
I'm inclined to say that this is not a bug.
REBOL strings use "^/" as a line terminator internally. When you convert to REBOL strings, you convert to REBOL internal line termination. All of the REBOL functions that deal with strings expect REBOL line termination. Other line termination standards are an external matter, handled by the conversion routines that are used to format the strings in binary: WRITE if you want something platform-specific, TO-BINARY if you don't. And use DELINE and ENLINE if you need to work around this (though see Oldes/Rebol-wishes#42).
If you want binary information conserved then work in binary; don't convert to string. This will save you from the invalid UTF character conversion as well.
This issue was also raised on the Atronix R3 repository. I wrote a blog entry that summarizes my opinions on why we should be thinking about living in a world without CR LF:
http://blog.hostilefork.com/death-to-carriage-return/
Of the bug, I said:
Other characters that should be excluded would be things like the BOM (Byte-Order-Mark), which is basically a bug if it appears in UTF-8 data, most of the time.
The text was updated successfully, but these errors were encountered: