Discussion:
Something rather odd: results change by adding blank lines to the code
(too old to reply)
R.Wieser
2018-02-28 11:15:26 UTC
Permalink
Hello All,

I'm writing some script to check if a certain file does infact contain
unicode, or is just ASCII formatted that way. The code is nothing special,
just using a ReadAll to get the file contents, and than function which
checks the first two chars (&hFF, &hFE) and a loop checking each second char
(&h00).

The odd thing is that I got back that one of those &h00 checks failed, when
I could see (using a hex editor) that all of those where infact zero.

When I added a wscript.echo to show the index and the contents the problem
disappeared. When I removed the line again the problem came back. Huh ?

It became stranger though: when I just placed an empty line there the
problem went away too.

And than sometimes adding an empty line somewhere else (regardless of the
middle, begin or end of the script) made the location of the failed check
change - or disappear.

I'm confused, and worried: what is going on here ?

Regards,
Rudy Wieser
Wally W.
2018-02-28 13:05:43 UTC
Permalink
Post by R.Wieser
Hello All,
I'm writing some script to check if a certain file does infact contain
unicode, or is just ASCII formatted that way. The code is nothing special,
just using a ReadAll to get the file contents, and than function which
checks the first two chars (&hFF, &hFE) and a loop checking each second char
(&h00).
The odd thing is that I got back that one of those &h00 checks failed, when
I could see (using a hex editor) that all of those where infact zero.
When I added a wscript.echo to show the index and the contents the problem
disappeared. When I removed the line again the problem came back. Huh ?
It became stranger though: when I just placed an empty line there the
problem went away too.
And than sometimes adding an empty line somewhere else (regardless of the
middle, begin or end of the script) made the location of the failed check
change - or disappear.
I'm confused, and worried: what is going on here ?
Regards,
Rudy Wieser
Can't say much without seeing some code.

Some languages don't play well with the nul character, &h00.

VBScript could be one of them.

Don't know why the location of your extra line wouldn't matter if
VBScript is getting a &h00 stuck somewhere.

What if you change the file so there are &h01 characters where there
are now &h00 characters. It will be a gibberish file, but that isn't
the point right now. Then look for &h01 with and without your extra
line.

Same result?

Same result when opening the file as a different format?

<https://msdn.microsoft.com/de-de/library/314cz14s(v=vs.84).aspx>
Optional. One of three Tristate values used to indicate the format of
the opened file (TristateTrue = -1 to open the file as Unicode,
TristateFalse = 0 to open the file as ASCII, TristateUseDefault = -2
to open the file as the system default). If omitted, the file is
opened as ASCII.
R.Wieser
2018-02-28 14:14:42 UTC
Permalink
Wally,
Post by Wally W.
Can't say much without seeing some code.
Well, your "not much" is quite enough in this case.
Post by Wally W.
Some languages don't play well with the nul character, &h00.
VBScript could be one of them.
Which is most likely the problem. As it turns out (I kept trying to find
solutions) I stumbled over a google result which mentioned that ReadAll
flunks out when trying to read "binary" content (and even though
'opentextfile' says it can handle unicode, 'ReadAll' apparently cannot), and
returns a buffer which is as large as the to-be-read file, but only filled
with it upto the first zero .... with the rest filled with whatever was in
memory when the string was allocated.
Post by Wally W.
Don't know why the location of your extra line wouldn't matter if
VBScript is getting a &h00 stuck somewhere.
My thoughts exactly.

But see above. It looks like was just my changing of the source files size
which made the string-buffer being allocated from another location, and thus
with different garbage.
Post by Wally W.
Same result when opening the file as a different format?
Yep, tried that. No difference. Same wonkey behaviour.

Regards,
Rudy Wieser
JJ
2018-02-28 13:25:50 UTC
Permalink
Post by R.Wieser
Hello All,
I'm writing some script to check if a certain file does infact contain
unicode, or is just ASCII formatted that way. The code is nothing special,
just using a ReadAll to get the file contents, and than function which
checks the first two chars (&hFF, &hFE) and a loop checking each second char
(&h00).
The odd thing is that I got back that one of those &h00 checks failed, when
I could see (using a hex editor) that all of those where infact zero.
When I added a wscript.echo to show the index and the contents the problem
disappeared. When I removed the line again the problem came back. Huh ?
It became stranger though: when I just placed an empty line there the
problem went away too.
And than sometimes adding an empty line somewhere else (regardless of the
middle, begin or end of the script) made the location of the failed check
change - or disappear.
I'm confused, and worried: what is going on here ?
Regards,
Rudy Wieser
Specify the ASCII format when opening the file. Otherwise, by default, WSH
will try to detect the file format, which is not 100% guaranteed to be
accurate.
Wally W.
2018-02-28 13:38:10 UTC
Permalink
Post by JJ
Post by R.Wieser
Hello All,
I'm writing some script to check if a certain file does infact contain
unicode, or is just ASCII formatted that way. The code is nothing special,
just using a ReadAll to get the file contents, and than function which
checks the first two chars (&hFF, &hFE) and a loop checking each second char
(&h00).
The odd thing is that I got back that one of those &h00 checks failed, when
I could see (using a hex editor) that all of those where infact zero.
When I added a wscript.echo to show the index and the contents the problem
disappeared. When I removed the line again the problem came back. Huh ?
It became stranger though: when I just placed an empty line there the
problem went away too.
And than sometimes adding an empty line somewhere else (regardless of the
middle, begin or end of the script) made the location of the failed check
change - or disappear.
I'm confused, and worried: what is going on here ?
Regards,
Rudy Wieser
Specify the ASCII format when opening the file. Otherwise, by default, WSH
will try to detect the file format, which is not 100% guaranteed to be
accurate.
This doesn't address the reported behaviour.

If it is opened in the wrong mode, why should his extra line matter?

One troubling thing is: what R.Wieser is seeing may be a VBScript bug
that no one else has noticed and may be doing other bad things.

Another potentially troubling thing is: M$ may "fix" it by walking
away from VBScript altogether; telling people they shouldn't want to
use it. They have a habit of leaving broken software in their wake,
you know.
R.Wieser
2018-02-28 13:55:29 UTC
Permalink
JJ,
Post by JJ
Specify the ASCII format when opening the file.
Yeah, tried that too. Didn't work though. Thats when I noticed the rather
bizarre behaviour.
Post by JJ
Otherwise, by default, WSH will try to detect the file format, which
is not 100% guaranteed to be accurate.
Are you sure about that detecting bit ? The 'opentextfile' docs only allow
me to select either ANSI or Unicode (or use the systems default, which, in
my case, seems to be ANSI) ...


But, the cause has been found !

According to my stumbled-over google results ReadAll does not play well with
non-ANSI content, and although it creates a string large enough to hold the
files contents, it only fills it partially (upto the first zero I presume)
leaving the rest of the buffer untouched and filled with .. whatever was
there when the string space was allocated. Hence the changing results.

Regards,
Rudy Wieser
Mayayana
2018-02-28 15:01:46 UTC
Permalink
"R.Wieser" <***@not.available> wrote

| I'm writing some script to check if a certain file does infact contain
| unicode, or is just ASCII formatted that way. The code is nothing
special,
| just using a ReadAll to get the file contents, and than function which
| checks the first two chars (&hFF, &hFE) and a loop checking each second
char
| (&h00).
|
Use Read(x) instead of ReadAll. Microsoft,
in their wisdom, were trying to protect us from
binary files. ReadAll stops at the first null.

'---------------------------------
s = TS.Read(100)
A = GetArray(s)

' A is now a byte array that's accessible and you
' can walk it to check for nulls.

Function GetArray(sStr)
Dim iA, Len1, Len2, AStr()
On Error Resume Next
Len1 = Len(sStr)
ReDim AStr(Len1 - 1)
For iA = 1 to Len1
AStr(iA - 1) = Asc(Mid(sStr, iA, 1))
Next
GetArray = AStr
End Function
R.Wieser
2018-02-28 16:16:38 UTC
Permalink
Mayayana,
Microsoft, in their wisdom, were trying to protect us from
binary files. ReadAll stops at the first null.
To bad that they forgot to either zero out the rest of the string, or shrink
it to fit the actual read data ...

... apart from not mentioning that stop-at-first-zero in their documentation
ofcourse. :-(
Use Read(x) instead of ReadAll.
Didn't think of doing it that way. Instead I just read the file
byte-by-byte, doing the checking along the way (as well as gathering the
ASCII contents).

Regards,
Rudy Wieser

Loading...