Discussion:
EOL type detection
(too old to reply)
JJ
2022-03-05 08:39:26 UTC
Permalink
For example, below code creates two files: `unix` and `dos`. Each is written
with two lines: `123` then `abc`. Using LF EOL for the `unix` file, and CRLF
for the `dos` file.

FileSystemObject apparently support both EOL format when reading file, as
shown in below code. It properly read each line as having 3 characters.

[code]
set fs = createobject("scripting.filesystemobject")

set funix = fs.createtextfile("unix")
funix.write "123" & vblf & "abc" & vblf
funix.close

set fdos = fs.createtextfile("dos")
fdos.write "123" & vbcrlf & "abc" & vbcrlf
fdos.close

set funix = fs.opentextfile("unix")
unix1 = funix.readline
unix2 = funix.readline
funix.close
wsh.echo "unix file size = " & fs.getfile("unix").size
wsh.echo "unix1: len=" & len(unix1) & ", str='" & unix1 & "'"
wsh.echo "unix2: len=" & len(unix2) & ", str='" & unix2 & "'"
fs.deletefile "unix"

set fdos = fs.opentextfile("dos")
dos1 = fdos.readline
dos2 = fdos.readline
fdos.close
wsh.echo "dos file size = " & fs.getfile("dos").size
wsh.echo "dos1: len=" & len(dos1) & ", str='" & dos1 & "'"
wsh.echo "dos2: len=" & len(dos2) & ", str='" & dos2 & "'"
fs.deletefile "dos"
[/code]

The problem is that, there doesn't seem to be a way to retrieve the EOL type
detected by FileSystemObject. In this case, we'd have to manually implement
the EOL detection ourselves. Currently, I'm doing it like below, but IMO,
it's a bit tedious. So, it there a simpler method.

[code]
set f = fs.opentextfile("thefile")
s = f.readline
f.close
set f = fs.opentextfile("thefile")
f.skip len(s)
on error resume next
doseol = f.read(1) = vbcr
if err.number <> 1 then doseol = 1
f.close
'doseol: 0=unix, -1=dos, 1=unknown (no EOL in file)
[/code]
Mayayana
2022-03-05 13:57:42 UTC
Permalink
"JJ" <***@gmail.com> wrote

| The problem is that, there doesn't seem to be a way to retrieve the EOL
type
| detected by FileSystemObject. In this case, we'd have to manually
implement
| the EOL detection ourselves. Currently, I'm doing it like below, but IMO,
| it's a bit tedious. So, it there a simpler method.
|

I take the opposite approach. Having different returns in
different files gets complicated. And some text file readers
don't recogize them. (My Notepad shows continuous text
with boxes to show line returns.)

So I have a VBScript on my desktop.
If I download something like code samples using Unix line returns
I just drop the file or folder on my script. First the script checks
to see whether it's a file or folder, then it does the following:

Sub FixFol(FolPath)
Dim SubPath, s2, sExt
Set oFol = FSO.GetFolder(FolPath)
Set oFils = oFol.Files
For Each oFil in oFils
FPath = oFil.Path
sExt = UCase(Right(FPath, 3))
Select Case sExt
Case "TXT", ".MD", "TML", ".JS", "SON", "CSS", ".CS",
"BAT", ".PY"
FixReturnsFile FPath
iCount = iCount + 1
Case Else
'--avoid touching binary files.
End Select
Next
Set oFils = Nothing

Set Fols = oFol.SubFolders
If Fols.count > 0 Then
For Each Fol in Fols
SubPath = Fol.Path
FixFol SubPath
Next
End If
Set Fols = Nothing
Set oFol = Nothing
End Sub


Sub FixReturnsFile(sPath)
On Error Resume Next
Set TS = FSO.OpenTextFile(sPath, 1, False)
s = TS.ReadAll
TS.Close
Set TS = Nothing

'-------- replace linefeed characters with vbcrlf ------------------------
s1 = Replace(s, vbCrLf, vbCr, 1, -1, 0)
s1 = Replace(s1, vbLf, vbCr, 1, -1, 0)
s1 = Replace(s1, vbCr, vbCrLf, 1, -1, 0)
s1 = Replace(s1, vbCr & vbCr, vbCr, 1, -1, 0)

'-- -----write file. -----------------
If FSO.fileexists(sPath) = True Then
FSO.deletefile sPath, True
End If
Set TS = FSO.CreateTextFile(sPath, True)
TS.Write s1
TS.Close
Set TS = Nothing
End Sub
JJ
2022-03-05 23:27:47 UTC
Permalink
Post by Mayayana
And some text file readers
don't recogize them. (My Notepad shows continuous text
with boxes to show line returns.)
Well, the opposite is also true. There are softwares which don't like *nix
EOL. There are also softwares which don't like DOS EOL.
Post by Mayayana
If I download something like code samples using Unix line returns
I just drop the file or folder on my script. First the script checks
[snip]

It's good that it works for you, but for my case, I need to preserve the EOL
format.
R.Wieser
2022-03-05 14:49:07 UTC
Permalink
JJ,
Currently ... it's a bit tedious. So, it there a simpler method.
Yes, but it comes at a cost :

[code]
set oFile = oFS.OpenTextfile(sFile)
sData = oFile.ReadAll
oFile.Close

p1=instr(sData,vbCR)
p2=instr(sData,vbLF)
[/code]

Now all you have to do is to compare p1 and p2 to get the types.

Apart from "malformed" files (mixing different EOLs) VBScript only seems to
have a problem with CR-only EOL's (reads everything as a single line).

Regards,
Rudy Wieser
JJ
2022-03-05 23:28:59 UTC
Permalink
Post by JJ
[code]
set oFile = oFS.OpenTextfile(sFile)
sData = oFile.ReadAll
oFile.Close
p1=instr(sData,vbCR)
p2=instr(sData,vbLF)
[/code]
Now all you have to do is to compare p1 and p2 to get the types.
That's a cost which I can't afford, since the input file sometimes can be
pretty large.
Post by JJ
Apart from "malformed" files (mixing different EOLs) VBScript only seems to
have a problem with CR-only EOL's (reads everything as a single line).
AFAIK, Windows never support Mac's EOL.
R.Wieser
2022-03-06 09:06:02 UTC
Permalink
JJ,
Post by JJ
That's a cost which I can't afford, since the input file sometimes
can be pretty large.
You already have to : a 'ReadLine' will cause a vbCR EOL file to be read all
at once. Which ofcourse will also happen with files which simply do not
have EOLs.

But if the storage size is what would be the problem you could just read the
stream character-by-character up until you found both the vbCR and vbLF
chars (or reach EOF). Instead of the cost being storage space it would be
time.
Post by JJ
AFAIK, Windows never support Mac's EOL.
Have you already tried to create such a file and open it with write.exe ? I
did, and it looks pretty normal. Not so much when using (XP's) notepad
though.

Also, if you chose not to support such MAC-style files you still will need
to be able to recognise them - even just so you can reject them. :-)

Regards,
Rudy Wieser
JJ
2022-03-07 22:36:55 UTC
Permalink
Post by R.Wieser
You already have to : a 'ReadLine' will cause a vbCR EOL file to be read all
at once. Which ofcourse will also happen with files which simply do not
have EOLs.
Fortunately, none of my input file will be like that.
Post by R.Wieser
But if the storage size is what would be the problem you could just read the
stream character-by-character up until you found both the vbCR and vbLF
chars (or reach EOF). Instead of the cost being storage space it would be
time.
That is another way to do it, but the disadvantage outweighs the benefit.
Post by R.Wieser
Have you already tried to create such a file and open it with write.exe ? I
did, and it looks pretty normal. Not so much when using (XP's) notepad
though.
That's surprising. I didn't except WordPad to actually support Mac EOL. But
that's probably the only part of Windows which support Mac EOL. FYI, the old
(16-bit) Write doesn't support Mac EOL.
Post by R.Wieser
Also, if you chose not to support such MAC-style files you still will need
to be able to recognise them - even just so you can reject them. :-)
No need in my case, because if there's actually one which uses Mac EOL, it
will fail the data validation which is done later in the script anyway.
R.Wieser
2022-03-08 10:03:08 UTC
Permalink
JJ,
Post by JJ
Fortunately, none of my input file will be like that.
:-) And there I was, thinking that that "on error resume next" was the
result of defensive programming.
Post by JJ
That is another way to do it, but the disadvantage outweighs the benefit.
As I said, it comes with a cost. Its upto you to decide if its worth it.

But do realize that that character-by-character reading of the file is
buffered internally, so the performance hit is probably less than you might
think.

Also, if you know what the input file looks like you could choose to just
'read(xxx)' the first few KB - a few lines worth - and determine the EOL
type from it.
Post by JJ
That's surprising. I didn't except WordPad to actually support Mac EOL.
I didn't say it /supports/ Mac EOLs, just that it /displays/ in an expected,
readable way. No idea what happens when you edit and than save such a file
(didn't go as far as to test that). It could easily become one of those
frankenmonster files, mixing different types of EOLs ... :-\
Post by JJ
No need in my case, because if there's actually one which uses Mac
EOL, it will fail the data validation which is done later in the
script anyway.
In that case, are you sure you actually want to preserve the EOLs in the
file and not just (temporarily?) reformat the file in question ? It could
result in something parsable.

Than again, I get the feeling that speed is of the utmost importance to you
and everything else comes second - even getting results.

Regards,
Rudy Wieser
JJ
2022-03-09 02:30:52 UTC
Permalink
Post by R.Wieser
I didn't say it /supports/ Mac EOLs, just that it /displays/ in an expected,
readable way. No idea what happens when you edit and than save such a file
(didn't go as far as to test that). It could easily become one of those
frankenmonster files, mixing different types of EOLs ... :-\
Just to give a report, Mac EOLs are saved as DOS EOLs by Wordpad.
Post by R.Wieser
In that case, are you sure you actually want to preserve the EOLs in the
file and not just (temporarily?) reformat the file in question ? It could
result in something parsable.
Yes, I need to physically preserve the EOL type. The EOLs are just data
separator and only the line contents are meaningful. However, if the file is
modified, its original EOL type must be preserved.
Post by R.Wieser
Than again, I get the feeling that speed is of the utmost importance to you
and everything else comes second - even getting results.
Functional code comes first, cause it'll be a useless software if it's fast
but broken. That is the standard for any programmer with the right mind.
Performance, memory usage, and code size comes next, depending on the
requirement. At least, for me.

Loading...