ADODB.Stream binary array to binary string failed unless x-user-defined is used

Discussion:

ADODB.Stream binary array to binary string failed unless x-user-defined is used

(too old to reply)

JJ

2019-09-16 01:23:16 UTC

I'm writing a slimmed down version of Base64 decoder based on the code of
below page (long URL warning).

<https://www.maxvergelli.com/base64-encoding-decoding-functions-in-vbscript-classic-asp/>

However, there's a problem when converting bynary byte array to binary
string. If I use `us-ascii` character set, the most significant bits are
stripped out from all of the bytes and the result becomes 7-bit data unless
I use `x-user-defined` character set. Did I miss something? Below is my
code.

'load base64 text from input file into string
set fs = createobject("scripting.filesystemobject")
set f = fs.opentextfile("input.b64")
s = f.readall
f.close
'convert base64 text string to binary byte array
set x = createobject("msxml2.domdocument.3.0")
set n = x.createelement("z")
n.datatype = "bin.base64"
n.text = s
d = n.nodetypedvalue
'convert bynary byte array to binary string
set ds = createobject("adodb.stream")
ds.type = 1 'binary
ds.open
ds.write d
ds.position = 0
ds.type = 2 'text
ds.charset = "us-ascii" '!!result is 7-bit!!
'ds.charset = "x-user-defined" 'can only use this to produce 8-bit
s = ds.readtext
set f = fs.createtextfile("output.bin")
f.write s
f.close

The `input.b64` test file is below. It's a 256 bytes ASCII from 0x00 to
0xFF.

AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygpKissLS4vMDEyM
zQ1Njc4OTo7PD0+P0BBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWltcXV5fYGFiY2RlZm
doaWprbG1ub3BxcnN0dXZ3eHl6e3x9fn+AgYKDhIWGh4iJiouMjY6PkJGSk5SVlpeYmZq
bnJ2en6ChoqOkpaanqKmqq6ytrq+wsbKztLW2t7i5uru8vb6/wMHCw8TFxsfIycrLzM3O
z9DR0tPU1dbX2Nna29zd3t/g4eLj5OXm5+jp6uvs7e7v8PHy8/T19vf4+fr7/P3+/w==

Mayayana

2019-09-16 03:03:38 UTC

"JJ" <***@vfemail.net> wrote

| I'm writing a slimmed down version of Base64 decoder based on the code of
| below page (long URL warning).
|

You don't need all that stuff. Base-64 is actually a fairly
simple math operation. There's no need to call in a special
component to do that part.

See the desktop pack here:

https://www.jsware.net/jsware/scrfiles.php5#desk

I call it that because it's various things I keep on the desktop.
One of them is a drag/drop script that will convert a file to
or from Base-64. It also handles conversion to/from email
format that uses a return every 76 characters.

|
| The `input.b64` test file is below. It's a 256 bytes ASCII from 0x00 to
| 0xFF.
|
| AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygpKissLS4vMDEyM
| zQ1Njc4OTo7PD0+P0BBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWltcXV5fYGFiY2RlZm
| doaWprbG1ub3BxcnN0dXZ3eHl6e3x9fn+AgYKDhIWGh4iJiouMjY6PkJGSk5SVlpeYmZq
| bnJ2en6ChoqOkpaanqKmqq6ytrq+wsbKztLW2t7i5uru8vb6/wMHCw8TFxsfIycrLzM3O
| z9DR0tPU1dbX2Nna29zd3t/g4eLj5OXm5+jp6uvs7e7v8PHy8/T19vf4+fr7/P3+/w==

That works fine decoding with my script. Though I wouldn't call
it "256 bytes ascii". First, ASCII only includes up to byte 127. 8-bit,
one-byte-per-character text is ANSI. But since th output is a
non-specific binary file, it's really a 256-byte file with bytes from
0 to 255 in ascending order. If you call it a txt file it will be an
ANSI file with characters dependent on the local codepage. But
nothing makes it an ANSI file.

GS

2019-09-16 03:58:01 UTC

Post by Mayayana
You don't need all that stuff. Base-64 is actually a fairly
simple math operation. There's no need to call in a special
component to do that part.

That's a very concise method; - it duplicates David Ireland's Radix 64 method
exactly both ways. Nice job!!!

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

Mayayana

2019-09-16 14:08:58 UTC

"GS" <***@v.invalid> wrote

| That's a very concise method; - it duplicates David Ireland's Radix 64
method
| exactly both ways. Nice job!!!
|

David Ireland? Radix64? I looked it up but only seem to
be finding a pastor.

I came up with it because many years ago I was writing
SMTP software and needed to incorporate Base64 operations.
I'm not sure where I got the code but it was probably
from VBSpeed. That's where I've always gone for high
efficiency operations.

Later I added Base64 to a binary
functions DLL for VBS, but I also wanted to provide as much
code as possible that didn't require my 3rd-party DLL. With
that I worked out that Textstream could actually handle
any binary operation if it's done in the right way. So then
I just adapted the VB code for VBS, adding it to a VBS binary
class, which is also online. The class includes sample scripts to
do things like lighten a BMP image, read file headers, and
do Base64 conversion -- all with only Textstream. I also use
those methods for scripts that extract icons from PE files
and retrieve PE data, such as import and export table. It's
handy because it can be trusted to work on all systems,
as long as the local codepage is not a wide character language.
(Japanese, Chinese, Korean.) It doesn't depend on anything
that's not in WSH.

GS

2019-09-16 16:32:29 UTC

Post by Mayayana

Post by GS
That's a very concise method; - it duplicates David Ireland's Radix 64
method exactly both ways. Nice job!!!

David Ireland? Radix64? I looked it up but only seem to
be finding a pastor.
I came up with it because many years ago I was writing
SMTP software and needed to incorporate Base64 operations.
I'm not sure where I got the code but it was probably
from VBSpeed. That's where I've always gone for high
efficiency operations.
Later I added Base64 to a binary
functions DLL for VBS, but I also wanted to provide as much
code as possible that didn't require my 3rd-party DLL. With
that I worked out that Textstream could actually handle
any binary operation if it's done in the right way. So then
I just adapted the VB code for VBS, adding it to a VBS binary
class, which is also online. The class includes sample scripts to
do things like lighten a BMP image, read file headers, and
do Base64 conversion -- all with only Textstream. I also use
those methods for scripts that extract icons from PE files
and retrieve PE data, such as import and export table. It's
handy because it can be trusted to work on all systems,
as long as the local codepage is not a wide character language.
(Japanese, Chinese, Korean.) It doesn't depend on anything
that's not in WSH.

Interesting that it works same as the VB6 solution I found by David Ireland
many years ago when I was developing my AppLicensing system. Yours is more
concise so I'm thinking to convert it for VB6/C# use.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

Mayayana

2019-09-16 17:13:54 UTC

"GS" <***@v.invalid> wrote

| Interesting that it works same as the VB6 solution I found by David
Ireland
| many years ago when I was developing my AppLicensing system. Yours is more
| concise so I'm thinking to convert it for VB6/C# use.
|

See here:

http://www.xbeat.net/vbspeed/c_Base64Enc.htm

It looks like I'm using #1, or close to it. But there's a big
difference for VB: It's possible to coerce a string into a
byte array using StrConv, while in VBS I have to do it
explicitly, one byte at a time, converting first to numeric
and then back to character. It's a very sloppy operation.

I've never actually experimented to see if that could
be optimized further. The VBS is plenty fast for the
things I need. For anything that needs speed I'd just
use VB6, anyway. And the VBS code I have is actually
somewhat faster without 76-characters-per-line formatting.
That brings the 3.6 MB file operation down from 13 and
9.2 to 9 and 8.5 seconds. But for my own uses I like
to leave that in. It allows me to decode from email
without worrying about such details.

Out of curioisity I tried the possibility of coercing from
byte to character with VBS Join. But it doesn't work. Not
surprisingly, it converts the number to text. So instead of
converting bytes 65 and 66 to "AB" it converts them to
the string "6566". :)

GS

2019-09-17 00:26:50 UTC

Post by Mayayana
http://www.xbeat.net/vbspeed/c_Base64Enc.htm

Thanks, I'll follow that up!

C# uses the dotnet function, of course, but doesn't work when running a C# app
on an OS not using the .Net Framework.

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

Mayayana

2019-09-17 02:08:14 UTC

"GS" <***@v.invalid> wrote

| C# uses the dotnet function, of course, but doesn't work when running a C#
app
| on an OS not using the .Net Framework.
|

I still don't really know anything about .Net, but
I suppose it must be a lot like VB in one respect:
Most things are fairly easy, but they can always be
improved by cutting out the runtime middleman. :)

Over the years I gradually went to API or straight
code for just about everything except GUI in VB. And
even sometimes for that. I use tabs in my code editor
that are drawn directly with GDI functions. The code
came from Jerry French; a few lines of code eliminating
the need for a tabstrip control.

GS

2019-09-18 02:08:21 UTC

Post by Mayayana

Post by GS
C# uses the dotnet function, of course, but doesn't work when running a C#
app on an OS not using the .Net Framework.

I still don't really know anything about .Net, but
Most things are fairly easy, but they can always be
improved by cutting out the runtime middleman. :)
Over the years I gradually went to API or straight
code for just about everything except GUI in VB. And
even sometimes for that. I use tabs in my code editor
that are drawn directly with GDI functions. The code
came from Jerry French; a few lines of code eliminating
the need for a tabstrip control.

Straight code is the direction I'm heading with C#. Tried that with VB[A] with
very good results so... "if it ain't broke.."! <g>

--
Garry

Free usenet access at http://www.eternal-september.org
Classic VB Users Regroup!
comp.lang.basic.visual.misc
microsoft.public.vb.general.discussion

JJ

2019-09-16 11:27:47 UTC

Post by Mayayana
You don't need all that stuff. Base-64 is actually a fairly
simple math operation. There's no need to call in a special
component to do that part.

Actually, I need it for performance reason. VBScript is too slow in
comparison with native code.

Mayayana

2019-09-16 13:55:31 UTC

"JJ" <***@vfemail.net> wrote

| > You don't need all that stuff. Base-64 is actually a fairly
| > simple math operation. There's no need to call in a special
| > component to do that part.
|
| Actually, I need it for performance reason. VBScript is too slow in
| comparison with native code.

You must be doing some very big conversions. Then why
not do it in VB? That's actually how I got my code. I'd
needed to do it in VB at one point and later just translated
that to VBS.

But you're right that it's not terribly fast. It's pretty much
instant for a typical attachment or email content, but I'd
never had occasion to use it for big things. So I just now
tested it.

135 KB - .4 seconds to encode. .3 secondes to decode
470 KB - 1.7 seconds to encode. 1.1 seconds to decode
3.6 MB - 13 seconds to encode. 9.2 seconds to decode.

So it's pretty limited past 1 MB or so.

The MSXML/adodb version is much faster to encode. But
as you say, it doesn't actually work for a binary file. When
I tried it on a 4.3 MB Bible in plain text it was still fast to
encode. .2 seconds. But oddly it takes 8.3 seconds to decode.
Not much faster than the VBS.

I see at least a couple of problems with that code.

* For your decoding the load isn't a problem, but to encode
it won't work to do a ReadAll. It will stop at the first null.

* The code is only designed for string conversion. The author
calls functions "StringtoBinary" and "BinarytoString", but that's
just confusing. There's no binary data involved. Base64 is, by
definition, not binary data.

* ADODB, like FSO, takes a lot of the functionality out
of your hands. For a text stream I think you could use
something like "Windows-1252" to get 8-bit translation.
Anything that's ANSI and not ASCII. That might work.
But when I tried it it's still failing. It also fails if I set the
load type to text. (Why is the text loaded as binary when
it isn't?) And it fails if I set the write type to binary. I'm
guessing the problem may be because ADODB,
like FSO and TS, second guesses what you're doing. But
I'm not sure. I've rarely used ADODB and I'm not clear about
how it handles data.

The code you're using was only designed to work with
a text string, which of course is rarely what people are
doing with Base64.

That stuff drives me crazy. People post code that works
rather than what's needed. What good is base-64 conversion
that won't work with binary data? One could use it to lightly
encode something like an email address. That's about it.

Another great example of this is
SetupIterateCabinet from setupapi.dll. When I needed CAB
file operations I searched all over. Everyone in VB said
SetupIterateCabinet was the method. The only method.
I finally figured out that I was having problems with it
because it can only handle 1 or 3 CAB types. CAB functions
are in cabinet.dll. setupapi.dll is just a wrapper for Microsoft
setups. Since cabinet.dll is CDECL and VB can't call it
directly, everyone just pretended that SetupIterateCabinet
worked!

Mayayana

2019-09-16 14:32:44 UTC

Woops.

"I finally figured out that I was having problems with it
because it can only handle 1 or 3 CAB types."

That should read "1 of 3". CABs can be compressed
with MSZIP, LZX, or a third type that I've forgotten
now. MSZIP is common. But LZX is not uncommon.
Which means that SetupIterateCab is all but useless
in distributed software.

JJ

2019-09-17 13:15:09 UTC

Post by Mayayana
You must be doing some very big conversions.

It can either be big, or small but many.

Post by Mayayana
Then why not do it in VB?

I need it to be easily modified for corrections or expandibility. And to
avoid being shunned just because it was a third party EXE.

Post by Mayayana
The MSXML/adodb version is much faster to encode. But
as you say, it doesn't actually work for a binary file. When
I tried it on a 4.3 MB Bible in plain text it was still fast to
encode. .2 seconds. But oddly it takes 8.3 seconds to decode.
Not much faster than the VBS.

I'm kind of obsessed with efficiency and performance, so I can't help but
use one with a better performance even if it's only a small difference - as
long as it's still within the restrictions.

Post by Mayayana
* For your decoding the load isn't a problem, but to encode
it won't work to do a ReadAll. It will stop at the first null.

I don't encounter such problem in my Win7 system. `ReadAll()` doesn't stop
at Null character. The file is opened using ASCI mode, FYI. I've already
tested it with various binary files. WinXP doesn't have this problem too.

Do you have a working script and data file to reproduce the problem?

Post by Mayayana
* The code is only designed for string conversion. The author
calls functions "StringtoBinary" and "BinarytoString", but that's
just confusing. There's no binary data involved. Base64 is, by
definition, not binary data.

That is true, but VB string is just a container for data which is treated as
text. The VB string itself can store Null characters, 8-bit characters, and
even Unicode characters. Those functions uses ADODB.Stream to convert
between raw data of the string, to byte array. And vice versa.

Post by Mayayana
* ADODB, like FSO, takes a lot of the functionality out
of your hands. For a text stream I think you could use
something like "Windows-1252" to get 8-bit translation.
Anything that's ANSI and not ASCII. That might work.

Using `windows-1252` character set works for me, but that's because the
system character set is also `windows-1252`. It won't work if the system
character set is not `windows-1252`.

I thought the `x-user-defined` character set is a neutral/any, but somehow
it doesn't work if the system character set is not `windows-1252`. FYI, I've
only tested it with `windows-1253` system character set (Greek). Guess, I'll
have to retrieve the name of the system character set before the conversion,
considering that there's no characterset label for the active character set.

Mayayana

2019-09-17 14:04:47 UTC

"JJ" <***@vfemail.net> wrote

| I don't encounter such problem in my Win7 system. `ReadAll()` doesn't stop
| at Null character. The file is opened using ASCI mode, FYI. I've already
| tested it with various binary files. WinXP doesn't have this problem too.
|
| Do you have a working script and data file to reproduce the problem?
|

Have to run out now. I'll retest that later.

| Using `windows-1252` character set works for me, but that's because the
| system character set is also `windows-1252`. It won't work if the system
| character set is not `windows-1252`.
|
Ick. You're right. I tried Greek and it sort of works but
a few things get translated, apparently due to codepage
translation.

Mayayana

2019-09-18 03:04:39 UTC

"JJ" <***@vfemail.net> wrote

| > * For your decoding the load isn't a problem, but to encode
| > it won't work to do a ReadAll. It will stop at the first null.
|
| I don't encounter such problem in my Win7 system. `ReadAll()` doesn't stop
| at Null character. The file is opened using ASCI mode, FYI. I've already
| tested it with various binary files. WinXP doesn't have this problem too.
|
| Do you have a working script and data file to reproduce the problem?
|

You seem to be right. I thought I remembered ReadAll
not working. A test on a GIF file showed it reading the whole
file when I checked Len on what was read in. But it is tricky.
For instance, Left(s, 100) only returns up to the first null.
And trying to write it back to disk fails. I would think the best
solution here would be to avoid FSO altogether, but I don't
know much about adodb and msxml, to know whether you could
just use those.

JJ

2019-09-18 09:17:48 UTC

Post by Mayayana
For instance, Left(s, 100) only returns up to the first null.
And trying to write it back to disk fails.

I don't have that problem when I tested it in both Win7 and WinXP. When I
use the 256 bytes binary data from the decoded Base64 string in my previous
post as the test binary file, everything works fine. And BTW, the WSH in
WinXP(SP3) is v5.7.0.16599, and in Win7 is v5.8.7600.16385. The version
numbers are retrieved from WSCRIPT.EXE file's version information.

set fs = createobject("scripting.filesystemobject")
set f = fs.opentextfile("test.bin")
s = f.readall
f.close
wscript.echo len(left(s, 100)) 'shows 100
z = left(s, 100)
wscript.echo len(z) 'shows 100
set f = fs.createtextfile("part.bin", true)
f.write(z)
f.close
wscript.echo fs.getfile("part.bin").size 'shows 100

Mayayana

2019-09-18 14:12:39 UTC

"JJ" <***@vfemail.net> wrote

| WinXP(SP3) is v5.7.0.16599

Yes.

| set fs = createobject("scripting.filesystemobject")
| set f = fs.opentextfile("test.bin")
| s = f.readall
| f.close
| wscript.echo len(left(s, 100)) 'shows 100
| z = left(s, 100)
| wscript.echo len(z) 'shows 100

Not len. Just Left(s, 100). It should show the first
100 characters. MsgBox Left(s, 100). With your
file I get a blank. With a GIF file I get only the first
few characters, up to the first null.
When I tried writing your file back to disk it did
work, as you said, but doing the same with a GIF
showed "invalid procedure call or argument". I don't
know why the discrepancy.

It's unpredictable because the FSO designers were
assuming dumb IT people who only needed to deal with
text. So you can handle nulls but in some cases they'll
mess things up if you don't do it just so. That seems to
also be the problem with the CharSet problem. FSO
is altering characters "to be helpful". so when I used
the Greek charset code to write a GIF back to disk, it was
mostly OK, but there were minore glitches. For instance
chr(34), a quote, got changed to something else that looked
like 2 commas.

JJ

2019-09-19 14:23:29 UTC

Post by Mayayana
Not len. Just Left(s, 100). It should show the first
100 characters. MsgBox Left(s, 100).

Well, that's because in most cases, the OS and applications store strings in
null-terminated storage, and treat them as null-terminated strings. So of
course characters following Null characters will never show.

VBScript stores string in variant type where the string storage has a length
field, and the string data field.

Mayayana

2019-09-19 15:47:30 UTC

"JJ" <***@vfemail.net> wrote

| > Not len. Just Left(s, 100). It should show the first
| > 100 characters. MsgBox Left(s, 100).
|
| Well, that's because in most cases, the OS and applications store strings
in
| null-terminated storage, and treat them as null-terminated strings. So of
| course characters following Null characters will never show.
|
| VBScript stores string in variant type where the string storage has a
length
| field, and the string data field.

Yes, but it gets complicated. As you know, VB is using
unicode behind the scenes but ANSI when a VB programmer
touches the string. ANSI does not have length bytes and
only one null signals the end of the string. Which is what
we're seeing when VBS doesn't show the whole string in a
msgbox. It thinks it has a string of 100 bytes, but then
when you look at it the 100 bytes are not there.

It's fine if it works for you. I'm just saying that in my
experience it doesn't always work, but that a few tricks can
be used to make sure it works on all but DBCS systems.

My class for doing that is here, in case anyone's
interested:

https://www.jsware.net/jsware/scrfiles.php5#bints

The operations are unavoidably clunky, but it allows for
fully dealing with textstream read/write as binary data, so
that one can handle binary files without unexpected
glitches.

Also, see my post today to Rudy. I get very different
behavior between Read(lenfile) and ReadAll.

R.Wieser

2019-09-18 09:30:06 UTC

Mayayana,

I thought I remembered ReadAll not working.

You remembered correctly.

A test on a GIF file showed it reading the whole
file when I checked Len on what was read in.

:-) You got the expected size and as such concluded that the data must all
be there ? In that case you fell headlong into its pit.

The problem is that (on XPsp3) it /allocates/ memory as much the size of the
file (but does not zero it out!), but than stops reading on the first zero
in that file. And that means you get part file, and part garbage.

Yeah, been there, done that and got thoroughly baffeled. :-(

@JJ,
Did you compare the file contents ? (FC /B test.bin part.bin)

Regards,
Rudy Wieser

JJ

2019-09-18 10:43:36 UTC

Post by R.Wieser
@JJ,
Did you compare the file contents ? (FC /B test.bin part.bin)

Yes, I did. With that 256 bytes binary file.

Mayayana

2019-09-18 14:16:04 UTC

"JJ" <***@vfemail.net> wrote

| > @JJ,
| > Did you compare the file contents ? (FC /B test.bin part.bin)
|
| Yes, I did. With that 256 bytes binary file.

Your sample file does seem to work better for some
reason. Weird stuff.

If it keeps working for you then that's fine. But if
you have any trouble you might want to just check
file size and then do Read(filesize). I've been using
a binary FSO class for years that';s been dependable,
but I use specific methods like that, never "looking a
null in the eye". :)

R.Wieser

2019-09-18 14:49:56 UTC

JJ,

Post by JJ
Yes, I did. With that 256 bytes binary file.

Odd that it works for you ... The only difference I can think of is that I
tried to write & read widestring files.

Regards,
Rudy Wieser

Mayayana

2019-09-19 13:29:49 UTC

"R.Wieser" <***@not.available> wrote

|
| > Yes, I did. With that 256 bytes binary file.
|
| Odd that it works for you ... The only difference I can think of is that
I
| tried to write & read widestring files.
|

It gets even more weird. First I should say that I'm
almost never dealing with unicode in these cases. I'm
assuming we're talking about "binary" files treated as
ANSI text.

If I open, read and write to disk JJ's short sample file,
even with a couple of extra nulls added in the middle,
it works. And Len shows the full length of the file after
reading it in, but Left(s1, 100) shows nothing.

If I do the exact same thing with a small GIF file I
get an invalid operation error when I try to write it back
to disk as a new file. But if I do it my way,
getting the length of file and doing a TS.Read(filelen)
then it works fine to write it back to disk. In both cases
wscript tells me the string I read in is length 13,297.
But the string acquired via ReadAll doesn't work.

I got curious about how much ADODB can do and
found another glitch: ADODB was removed on Server
2003. I don't know why. Security? But so far I haven't
found a way to use that to bypass FSO.

If I read in a GIF with ADODB it only allows me to
read it in binary. It seems to be sniffing the file. Weird.
If I set it to text I get an error "Operation not allowed in
this context."
If I read it in as binary then MSXML can't convert it
to Base-64. I'm finding MSXML very opaque to work with.
The method for doing a Base-64 conversion is bizarre.
Just assign the data type and insert a string and, presto,
like a magician it does the conversion, despite no explicit
call to a conversion method. That appears to be some kind
of bug-based hack that someone discovered along the way.

It looks like MSXML can do a lot of handy things but
the docs and the object model are pretty much inpenetrable
to me. And since it's mainly for handling XML, which I
have no use for, I'm not inspired to dig out these useful
nuggets, like implicit data transformation.

R.Wieser

2019-09-19 18:10:33 UTC

JJ,

Post by R.Wieser
Odd that it works for you ...

I generated your testfile with the chars 0 ... 255, which, using "readall",
worked perfectly for me too. So I experimented a bit.

As it truns out I got my "it garbage!" problem back when I prefixed the
example file with just the word "Hello". Turns out that at least 5
characters is all it takes ("aaaaa" makes a mess as well)

However, that doesn't explain why your GIF file worked though.

Regards,
Rudy Wieser

R.Wieser

2019-09-20 06:04:16 UTC

JJ,

Turns out that at least 5 characters is all it takes ("aaaaa" makes a mess
as well)

This morning I realized that by prefixing those five chars changing the
files contents was not all I did, it also changes the size of the file.

And whatdoyouknow, just changing the size of the file to be at least 261
chars (prepending, appending, randomly inserting any content you like)
causes the trashing.

And something remarkable: Its only the first 260 bytes that get trashed.
From character 261 the origional, expected content is visible again.

Regards,
Rudy Wieser

P.s.
Could you check the size of the GIF file (the one that worked for you) ?
Chances are its less than 261 bytes ...

JJ

2019-09-20 09:06:29 UTC

Post by R.Wieser
JJ,

Post by R.Wieser
Odd that it works for you ...

I generated your testfile with the chars 0 ... 255, which, using "readall",
worked perfectly for me too. So I experimented a bit.
As it truns out I got my "it garbage!" problem back when I prefixed the
example file with just the word "Hello". Turns out that at least 5
characters is all it takes ("aaaaa" makes a mess as well)
However, that doesn't explain why your GIF file worked though.
Regards,
Rudy Wieser

You're right. That test file, shows that ReadAll() is inconsistent and
buggy.

I also tried to use a test file which contains 0x00-0xFF then another
0x00-0xFF - totalling 512 bytes. While ReadAll() seems to succeed and the
variable length is 512, the received data is errorneous.

However, I found that using Read() with the same or larger number of
character of the file size, works (read count can't be too big though). I've
already tested it with above test file, and that "Hello"+(0x00-0xFF) test
file. e.g.

on error resume next
set fs = createobject("scripting.filesystemobject")
s256 = ""
for i = 0 to 255
s256 = s256 & chr(i)
next
hello256 = "Hello" & s256
set f = fs.createtextfile("hello256.bin", true, false)
f.write hello256
f.close
set f = fs.opentextfile("hello256.bin", 1, false, 0)
s = f.readall
f.close
if s = hello256 then
wscript.echo "readall() ok"
else
wscript.echo "readall() fail" 'this one is shown
end if
set f = fs.opentextfile("hello256.bin", 1, false, 0)
s = ""
do while not f.atendofstream
s = s & f.read(1048576)
loop
f.close
if s = hello256 then
wscript.echo "read(x) ok" 'this one is shown
else
wscript.echo "read(x) fail"
end if

For further testing, I use below code as a binary file copier.

set fs = createobject("scripting.filesystemobject")
set f = fs.opentextfile(wscript.arguments(0), 1, false, 0)
s = ""
do while not f.atendofstream
s = s & f.read(1048576)
loop
f.close
set f = fs.createtextfile(wscript.arguments(1), true, false)
f.write s
f.close

I use it with below batch file to test copy all files of Windows' SYSTEM32
folder. Validating each file copy.

@echo off
setlocal
for %%A in (c:\windows\system32\*) do (
echo %%~nxA...
cscript //nologo filecopy.vbs "%%A" test.tmp
fc/b "%%A" test.tmp > nul
if errorlevel 1 (
echo "%%~nxA" copy is not identical!
pause
exit
)
)

R.Wieser

2019-09-22 10:15:51 UTC

JJ,

Post by JJ
You're right. That test file, shows that ReadAll() is inconsistent
and buggy.

I've been taking a peek inside scrrun.dll (v5.7.0.16599), and have found
that - most likely - it all boils down to a single mistake: the routine (at
735A32E4) which copies a wide-string from one spot to another stops at a
(word)zero (at 735A333A...E), even when it has been given a length argument
for the source.

Padding the branch there with NOPs causes "readall" to return the files full
contents.

Combine that with the memory-allocation just grabbing some heap-space
without clearing it and you know what happens .... Yep, the observed
garbage.

I've already been thinking of patching it, but as it can easily circumvented
in script code and such a patch would make scripts incompatible with other
computers I don't think I should.

Regards,
Rudy Wieser

Mayayana

2019-09-22 13:38:04 UTC

| > You're right. That test file, shows that ReadAll() is inconsistent
| > and buggy.
|
| I've been taking a peek inside scrrun.dll (v5.7.0.16599), and have found
| that - most likely - it all boils down to a single mistake: the routine
(at
| 735A32E4) which copies a wide-string from one spot to another stops at a
| (word)zero (at 735A333A...E), even when it has been given a length
argument
| for the source.
|
| Padding the branch there with NOPs causes "readall" to return the files
full
| contents.
|

Very clever. I wonder if it was really a bug, though. As I
understand it, the "scripting guys" wrote the files and they
clearly didn't think much of their clientelle. There was a
famous posting that told scripters, in a condescending,
scolding tone, that they shouldn't try to do binary operations
with WSH. It was intended to just be a GUI update for BAT
files, to be used by sys admins who don't really know what
they're doing. The whole thing is designed with "ninny
barriers".

R.Wieser

2019-09-22 14:49:16 UTC

Mayayana,

Post by Mayayana
Very clever.

Thanks, but not really. Just some (freeware) IDA, a bit of programming and
a lot of trace-and-tracking.

Post by Mayayana
I wonder if it was really a bug, though.

As far as I can tell it has to be. Otherwise they could (should) have
stopped reading the contents (which they do 128 bytes at a time) as soon as
they encountered a Zero, and return a string with the size upto it (and no
more). Reading (way) beyond it simply doesn't make sense (especially not
when executed on a remote file/stream ...).

Post by Mayayana
There was a famous posting that told scripters, in a
condescending, scolding tone, that they shouldn't try to do
binary operations with WSH.

:-) That sounds like inventing restrictions to match the found bugs.

Over time I've encountered a number of win32 areas where something has been
designed to solve one problem - theirs - and the rest doesn't really matter.

Regards,
Rudy Wieser

Mayayana

2019-09-16 19:58:35 UTC

The following seems to work, but I got to fooling around
and now I'm not sure what I did right. :) It looks like the only
difference was in changing the CharSet value, but I thought
I'd tried that before and it didn't work. In any case, this code
seems to work now. On a 3.6 MB file it was very quick to encode.
.17 seconds. But oddly slow to decode. 5.5 seconds. It also works
fine on your sample.

So it may have only been a problem with using ascii as
CharSet. It didn't matter to the original author because he
only wanted to encode English text. Though I don't understand
why you couldn't use your "x-user-defined". In any case,
Windows-1252 works, as should any other ANSI encoding.

Dim LRet, Arg, FSO, TS, OFil, LSize, sOut, sIn

Arg = WScript.Arguments(0)

LRet = MsgBox("Click yes to encode file or no to decode.", 36)
If LRet = 6 Then
IfEncode = True
Else
IfEncode = False
End If

Set FSO = CreateObject("Scripting.FileSystemObject")
Set OFil = FSO.GetFile(Arg)
LSize = OFil.Size
Set OFil = Nothing
Set TS = FSO.OpenTextFile(Arg)
sIn = TS.Read(LSize)
Set TS = Nothing

t1 = timer
If ifencode = True Then
sOut = Base64Encode(sIn)
Set TS = FSO.CreateTextFile(Arg & "-en64", True)
TS.Write sOut
TS.Close
Set TS = Nothing
Else
sOut = Base64Decode(sIn)
Set TS = FSO.CreateTextFile(Arg & "-de64", True)
TS.Write sOut
TS.Close
Set TS = Nothing
End If
t2 = timer
Set FSO = Nothing
MsgBox CStr(t2 - t1)

Function Base64Encode(sText)
Dim oXML, oNode
Set oXML = CreateObject("Msxml2.DOMDocument.3.0")
Set oNode = oXML.CreateElement("base64")
oNode.dataType = "bin.base64"
oNode.nodeTypedValue = Stream_StringToBinary(sText)
Base64Encode = oNode.text
Set oNode = Nothing
Set oXML = Nothing
End Function

Function Base64Decode(ByVal vCode)
Dim oXML, oNode
Set oXML = CreateObject("Msxml2.DOMDocument.3.0")
Set oNode = oXML.CreateElement("base64")
oNode.dataType = "bin.base64"
oNode.text = vCode
Base64Decode = Stream_BinaryToString(oNode.nodeTypedValue)
Set oNode = Nothing
Set oXML = Nothing
End Function

Private Function Stream_StringToBinary(Text)
Const adTypeText = 2
Const adTypeBinary = 1
Dim BinaryStream 'As New Stream
Set BinaryStream = CreateObject("ADODB.Stream")
BinaryStream.Type = adTypeText
BinaryStream.CharSet = "Windows-1252"
BinaryStream.Open
BinaryStream.WriteText Text
BinaryStream.Position = 0
BinaryStream.Type = adTypeBinary
BinaryStream.Position = 0
Stream_StringToBinary = BinaryStream.Read
Set BinaryStream = Nothing
End Function

Private Function Stream_BinaryToString(Binary)
Const adTypeText = 2
Const adTypeBinary = 1
Dim BinaryStream 'As New Stream
Set BinaryStream = CreateObject("ADODB.Stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Open
BinaryStream.Write Binary
BinaryStream.Position = 0
BinaryStream.Type = adTypeText
BinaryStream.CharSet = "Windows-1252"
Stream_BinaryToString = BinaryStream.ReadText
Set BinaryStream = Nothing
End Function

Schmidt

2019-09-20 19:44:17 UTC

I'm writing a slimmed down version of Base64 decoder...

Not sure, why there's so much "confusion" about this
(and why one should read binary FileData into a String first).

The modus-operandi with Base64 is Binary:
- passed as Input (as ByteArray) into the enconder
- and returned as Output-(ByteArray) from the decoder

Here's some (pretty symmetrical) Helpers,
which do as they should in my *.asp-Scripts (on Win2008/Win2012 and
Win2016):

Function Base64Encode(Bytes) 'expects VarType "Byte()", returns a B64-String
With CreateObject("Msxml2.DOMDocument").CreateElement("e")
.DataType = "bin.base64"
.NodeTypedValue = Bytes
Base64Encode = .Text
End With
End Function
Function Base64Decode(sBase64) 'expects a B64-String, returns VarType
"Byte()"
With CreateObject("Msxml2.DOMDocument").CreateElement("e")
.DataType = "bin.base64"
.Text = sBase64
Base64Decode = .NodeTypedValue
End With
End Function

Function ReadBytesFromFile(FileName) 'returns VarType "Byte()"
With CreateObject("ADODB.Stream")
.Open
.Type = 1 'adTypeBinary
.LoadFromFile FileName
ReadBytesFromFile = .Read
.Close
End With
End Function
Sub WriteBytesToFile(FileName, Bytes) 'expects VarType "Byte()"
With CreateObject("ADODB.Stream")
.Open
.Type = 1 'adTypeBinary
.Write Bytes
.SaveToFile FileName, 2 'adSaveCreateOverWrite
.Close
End With
End Sub

HTH

Olaf

Mayayana

2019-09-20 21:11:42 UTC

"Schmidt" <***@vbRichClient.com> wrote

| Not sure, why there's so much "confusion" about this
| (and why one should read binary FileData into a String first).
|

Olaf! I didn't know you did scripting.

Did you try your code? I can't get it to work. I'd tried
out of curiosity earlier, to cut out FSO, but there seems
to be a conflict with types. ADODB binary read is an array
of bytes. MSXML expects variants.

My quick test tries this:

a = ReadBytesFromFile(arg)
s = Base64Encode(a)
WriteBytesToFile Arg & "-64.txt", s

arg is the path of a dopped GIF. WScript.Arguments(0)

Error, on the line .Write Bytes in WriteBytesToFile:

Arguments are of the wrong type, are out of acceptable range, or are in
conflict with one another.

When I then encode a file and drop that to run the
decode, I get "Error parsing as bin-base64 datatype".
That's in Base64Decode at the line .Text = sBase64

Schmidt

2019-09-20 22:18:44 UTC

Post by Mayayana
Olaf! I didn't know you did scripting.

Well, I do... (a lot) - mostly at the serverside though
(in the context of WebApps).

Less often on the Desktop (but then using vbRichClient5
as the HelperLib for VBScript-enhancements, which go as
far as supporting e.g. __stdcall and __cdecl Dll-calls -
but also allow DB-based GUI-Apps without (registering anything).

Here is the package of ScriptGUI5:
http://vbRichClient.com/Downloads/ScriptGUI5.zip

Which should work (without touching the registry)
on all Win-Systems > XP (on XP you'll need registering)

I've developed this tool primarily, to help blind people
(who for the most part prefer to develop in NotePad(++)
or some other simple editor instead of a "graphical IDE".

I know, that you did something similar for these guys -
perhaps you will find especially the fruitbasket-demo
interesting (wich shows GUI-design without "using Pixels"
for Control-Placement, and has Speech-Support).

Post by Mayayana
Did you try your code? I can't get it to work. I'd tried
out of curiosity earlier, to cut out FSO, but there seems
to be a conflict with types.

Then one should take "better care" of the types
(within "those Variants", which is all VBScript knows).

A helpful (Debugging-)Function is TypeName(...).

Those functions I've commented with 'expects "Byte()"',
were referring to the approriate Variant-SubType.

Post by Mayayana
ADODB binary read is an array of bytes.

Yes, and such a ByteArray can be perfectly hosted within
a VBScript-Variant (but further used only, for "passing it along").

Post by Mayayana
MSXML expects variants.

Yep - and certain Properties deliver - or expect,
ByteArrays (within Variants).

Post by Mayayana
a = ReadBytesFromFile(arg)
s = Base64Encode(a)
WriteBytesToFile Arg & "-64.txt", s

As commented in the Signatur for WriteBytesToFile,
you'll have to pass a Variant of SubType "Byte()",
not a Variant of SubType String (your s Variable).

If you want to write "a String" to a File (with
the above Function WriteBytesToFile), then you'll
have to convert it to SubType "Byte()" priorily.

Usual Candidates (for such conversions) are Functions like:
- StringToANSIBytes
- StringToUTF8Bytes
(I can post routines for that, if needed)

Your example (to stay OnTopic with Base64) should work e.g. this way:

a = ReadBytesFromFile(arg) 'read a file without interpretation to bytes
s = Base64Encode(a) 'encode ByteArray a into s as Base64-Content
b = Base64Decode(s) 'decode s-Base64-Content back into a ByteArray b
WriteBytesToFile Arg & ".EncDec", b 'write ByteArray b into a File

The Base64-string (in your case s), is in almost all scenarios
"a temporary thing" (it does not deserve to be written to disk) -
usually it gets "passed along" within JSON-Objects or XML-Nodes.

E.g. if you receive Base64-content as a String from a WebRequest
(for example, a JPG-file when it was encoded at the server, and
then passed "downwards" in a JSON-Result-Response), then you
might want to write that clientside received "JPG-Base64-String"
to disk after decoding - e.g. in a single line of code like:

WriteBytesToFile "c:\temp\my.jpg", Base64Decode(sB64jpgContent)

HTH

Olaf

Mayayana

2019-09-21 01:26:39 UTC

"Schmidt" <***@vbRichClient.com> wrote

| Here is the package of ScriptGUI5:
| http://vbRichClient.com/Downloads/ScriptGUI5.zip
|

With help files. I'm impressed.

| I know, that you did something similar for these guys -

I did a few small things and started working on a screen
reader, for a friend, but then he got his work to pay for
Jaws, and screen readers are a lot of work....

| A helpful (Debugging-)Function is TypeName(...).
|
| Those functions I've commented with 'expects "Byte()"',
| were referring to the approriate Variant-SubType.
|

Yes, but ADODB is sending a variant of bytes, not
a variant of array members that are variants of subtype
byte, which is what MSXML seems to need.

| > My quick test tries this:
| >
| > a = ReadBytesFromFile(arg)
| > s = Base64Encode(a)
| > WriteBytesToFile Arg & "-64.txt", s
|
| As commented in the Signatur for WriteBytesToFile,
| you'll have to pass a Variant of SubType "Byte()",
| not a Variant of SubType String (your s Variable).
|

Ah. Thanks. It works fine. I didn't realize at first glance
that I needed to adapt the functions, reading or writing
bytes or strings as needed. Once I made those changes
it works fine in both directions. Nice. And that bypasses FSO.

Mayayana

2019-09-21 03:50:25 UTC

I turned this into a finished script for drag drop.
Very interesting. With a 3.6 MB file it was .1 seconds
to convert to base-64 but 10.6 seconds to convert
it back. Slower than FSO. But the docs say it does a
lot of processing when it reads in text and recommend
reading 128KB at a time. So I tried that and got a speed
of .14 seconds!

'-------------------------------------------

Dim ADO, XML, s1, A1, Arg, IfEncode, oNode
Dim T1, T2

Arg = WScript.Arguments(0)

LRet = MsgBox("Click yes to encode file or no to decode.", 36)
If LRet = 6 Then
IfEncode = True
Else
IfEncode = False
End If

Set XML = CreateObject("Msxml2.DOMDocument")
Set ADO = CreateObject("ADODB.Stream")

T1 = Timer

If IfEncode = True Then
With ADO
.Open
.Type = 1 'Binary
.LoadFromFile Arg
A1 = .Read
.Close
End With

Set oNode = XML.CreateElement("El")
oNode.DataType = "bin.base64"
oNode.NodeTypedValue = A1
s1 = oNode.Text
Set oNode = Nothing

With ADO
.Open
.Type = 2 'text
.WriteText s1
.SaveToFile Arg & "-64.txt", 2 'OverWrite
.Close
End With
Else

With ADO
.Open
.Type = 2 'text
.LoadFromFile Arg
Dim iA, A2()
iA = 0
ReDim A2(100)
Do
s1 = .ReadText(128000)
If Len(s1) > 0 Then
A2(iA) = s1
Else
Exit Do
End If
iA = iA + 1
If iA mod 100 = 0 Then ReDim Preserve A2(iA + 100)
Loop
.Close
End With
s1 = Join(A2, "")

Set oNode = XML.CreateElement("El")
oNode.DataType = "bin.base64"
oNode.Text = s1
A1 = oNode.NodeTypedValue
Set oNode = Nothing

With ADO
.Open
.Type = 1 'binary
.Write A1
.SaveToFile Arg & "-64.dat", 2 'OverWrite
.Close
End With

End If
T2 = timer

MsgBox CStr(T2 - T1)

Set ADO = Nothing
Set XML = Nothing

Schmidt

2019-09-22 14:32:26 UTC

I turned this into a finished script ...

I'd leave the generic Helper-Functions I've posted intact
(one can place them in - and later load them from an include-file)

With just the two additional Helpers I've mentioned
(StringToBytes and BytesToString) you'd have all you need,
to replicate your Script with this short(er) code:

'*** script-code ***
Dim File, T, bInp, sB64
File = WScript.Arguments(0)

If MsgBox("Click yes to encode (no to decode)", vbYesNo) = vbYes Then
T = Timer
bInp = ReadBytesFromFile(File)
sB64 = Base64Encode(bInp)
WriteBytesToFile File & "-64.txt", StringToBytes(sB64, "utf-8")
Else
T = Timer
bInp = ReadBytesFromFile(File)
sB64 = BytesToString(bInp, "utf-8")
WriteBytesToFile File & "-64.dat", Base64Decode(sB64)
End If

MsgBox Timer - T
'*** end of script-code ***

Ok, here again the helper-stuff (now including StringToBytes/BytesToString):

'******* a small set of generic Helper-Functions *******
'* (usually placed in and loaded from an Include-File) *
Function BytesToString(Bytes, Charset)
With CreateObject("ADODB.Stream")
.Open
.Charset = Charset
.Type = 1: .Write Bytes: .Position = 0
.Type = 2
Do Until .EOS
BytesToString = BytesToString & .ReadText(2^18)
Loop
.Close
End With
End Function
Function StringToBytes(S, Charset)
With CreateObject("ADODB.Stream")
.Open
.Charset = Charset
.Type = 2: .WriteText S: .Position = 0
.Type = 1
If LCase(Charset) = "utf-8" Then .Position = 3
StringToBytes = .Read
.Close
End With
End Function

Function Base64Encode(Bytes) 'expects VarType "Byte()", returns a B64-String
With CreateObject("Msxml2.DOMDocument").CreateElement("e")
.DataType = "bin.base64"
.NodeTypedValue = Bytes
Base64Encode = .Text
End With
End Function
Function Base64Decode(sBase64) 'expects a B64-String, returns VarType
"Byte()"
With CreateObject("Msxml2.DOMDocument").CreateElement("e")
.DataType = "bin.base64"
.Text = sBase64
Base64Decode = .NodeTypedValue
End With
End Function

Function ReadBytesFromFile(FileName) 'returns VarType "Byte()"
With CreateObject("ADODB.Stream")
.Open
.Type = 1 'adTypeBinary
.LoadFromFile FileName
ReadBytesFromFile = .Read
.Close
End With
End Function
Sub WriteBytesToFile(FileName, Bytes) 'expects VarType "Byte()"
With CreateObject("ADODB.Stream")
.Open
.Type = 1 'adTypeBinary
.Write Bytes
.SaveToFile FileName, 2 'adSaveCreateOverWrite
.Close
End With
End Sub
'*** end of generic script-helpers ***

HTH

Olaf

Mayayana

2019-09-22 16:45:49 UTC

"Schmidt" <***@vbRichClient.com> wrote

| > I turned this into a finished script ...
|
| I'd leave the generic Helper-Functions I've posted intact
| (one can place them in - and later load them from an include-file)
|

To each their own. To do both operations you end up
needing numerous helper functions. Something like 7 or 8.
Each step needs the helper function written differently.
Yet what I'm doing accomplishes the same thing in just
a few lines. And it doesn't repeatedly reference and
dereference ADODB and MSXML.

Also, I find it worthwhile to post actual working code,
not just theoretical code. That way people can just
paste what I wrote and test the functionality/speed
for themselves, without having to write their own code
from scratch. You're posting various raw materials but
not working code. What you posted doesn't actually
work as is.

So I was just turning it into working code that people
can test and time for themselves.

But the details here are also critical:
When reading in the base64 it needs to be read in as
text. That operation is amazingly slow. Reading bytes
is fast but reading a string is slow. And ADODB can't
copy bytes to a string. Note that the version I wrote is
reading in 128 KB at a time. The result is code that can
encode or decode 25 MB in less than 1 second. The older
version was taking about 4 seconds per MB and it turned
out all that lost time was on the read.
If you look up ReadText in the ADODB help there's
an explanation.

Also, if you do it the way I wrote it there's no
issue of charset. The only text being dealt with is
base64, which is ascii and the same on any computer.

So the working operation ends up being:

To encode:

read in as bytes with adodb
assign that to msxml NodeTypedValue
read out msxml text
write that to disk as text

To decode it needs to be reversed:

read in as text with adodb (128KB at a time)
assign that to msxml text
read out msxml NodeTypedValue
write that to disk as binary

Each step is unique. So in your method each
step requires instantiating a library, doing one
operation, then dereferencing (hopefully). Each
step is a function.

If you want to write yours up as working code
we can try it, but I expect yours will end up being
notably longer, a bit slower, and more brittle.

Schmidt

2019-09-22 17:41:52 UTC

Post by Mayayana
| > I turned this into a finished script ...
|
| I'd leave the generic Helper-Functions I've posted intact
| (one can place them in - and later load them from an include-file)
|
To each their own.

Goes without saying...

Post by Mayayana
To do both operations you end up needing numerous helper functions.
Something like 7 or 8.

No, it's exactly 6 (very small ones).

Post by Mayayana
Each step needs the helper function written differently.

No, as already stated, the functions are *generic*.

Post by Mayayana
Yet what I'm doing accomplishes the same thing in just
a few lines.

Nope, it's much more lines, compared to what I've posted.

Post by Mayayana
And it doesn't repeatedly reference and
dereference ADODB and MSXML.

You mean "instancing"...
And no, that is definitely not an expensive operation -
(at least not for ADODB.Stream or Msxml2.DOMDocument,
which take about 10 Micro-Seconds = 0.01 Milli-Seconds).

So, creating a new, fresh instance within each function,
is perfectly fine (no need for you, as the function-user,
to "know about" or "bother with" these Helper-Objects)

Post by Mayayana
Also, I find it worthwhile to post actual working code,
not just theoretical code.

Sorry, but every code-snippet I've posted in this thread,
already *is* working code - even the longer text in my last post.

But here is my replacement for your code again in a Zip:
http://vbRichClient.com/Downloads/B64Test.zip

Post by Mayayana
So I was just turning it into working code that people
can test and time for themselves.

Nope, if I may be so frank, you "murdered it"... ;-)

E.g. what you wrote out as Base64-Text in your DemoCode,
is unnecessarily blown-up two UTF16-LE TextFormat,
which is twice as large as the written file needs to be.

Post by Mayayana
When reading in the base64 it needs to be read in as text.

As already said in a prior posting - nobody really writes
out a singular Base64-encoded String into the FileSystem -
you'll encounter those Strings "as parts of other stuff"
(e.g. in http-headers, or in JSON- or XML-trees)

But if you need to read or write text from/to the FileSystem,
you can use my posted (6 Base-)Functions as well...

As they are currently, you'll have to do that in two steps -
but those two steps can be combined into one line:

To read into a VB-String from an UTF8-Text-File
MyString = BytesToString(ReadBytesFromFile(File), "utf-8")

To read into a VB-String from a ANSI-Text-File
MyString = BytesToString(ReadBytesFromFile(File), "x-ansi")

Post by Mayayana
Also, if you do it the way I wrote it there's no issue of charset.

As said, since you did not specify a Charset in your Code,
there definitely *is* an issue (because you generate UTF16-LE).

Post by Mayayana
If you want to write yours up as working code...

Again, every Function I've posted definitely *is* working code.

If you think otherwise, I'd like a proper citation, containing
the snippet which (in your opinion) was not working for you -
that's considered good style in Usenet-communication.

HTH

Olaf

Mayayana

2019-09-22 18:10:25 UTC

"Schmidt" <***@vbRichClient.com> wrote

| Sorry, but every code-snippet I've posted in this thread,
| already *is* working code - even the longer text in my last post.
|

No. The first one didn't work at all. It was
just the 4 helper functions. I tried it assuming
it would work, but it needed to be rewritten.

| Nope, if I may be so frank, you "murdered it"... ;-)
|
:)

| E.g. what you wrote out as Base64-Text in your DemoCode,
| is unnecessarily blown-up two UTF16-LE TextFormat,
| which is twice as large as the written file needs to be.
|

Interesting. It does no such thing on my end. Maybe that's
system-specific. We're both doing the same thing there, but
you're switching the base64 string to bytes before writing it.
and otherwise ADODB flips it to unicode? That seems very odd.
Maybe ADODB changes all strings to unicode on later systems?
That seems rather dopey, to take a base-64 ascii string and
switch it without being asked to.

I guess you could add that function to my code, but I'm
not sure it's necessary. And when I tried it I got "not
allowed in this context" at the charset assignment.

Schmidt

2019-09-22 18:46:20 UTC

Post by Mayayana
| Sorry, but every code-snippet I've posted in this thread,
| already *is* working code - even the longer text in my last post.
|
No. The first one didn't work at all.
It was just the 4 helper functions.

Well, those *exact* same functions are contained (unchanged)
in my larger example, which you just confirmed, did work.
;-)

Post by Mayayana
| E.g. what you wrote out as Base64-Text in your DemoCode,
| is unnecessarily blown-up two UTF16-LE TextFormat,
| which is twice as large as the written file needs to be.
|
Interesting. It does no such thing on my end.

Please check again, I'm getting (when encoding a 3MB-TestFile), not
the expected 4MB (Base64 "blows up" the bin-content by factor 1.33),
but 8MB instead - when using your code...

I've tested this on Win10 - as well as on a Win8-VM and also
on an old XP-VM).

Post by Mayayana
I guess you could add that function to my code, but I'm
not sure it's necessary. And when I tried it I got "not
allowed in this context" at the charset assignment.

Then do it in the same sequence (as shown in my functions
StringToBytes and BytesToString).

As for the speed-differences:

You can bring that "up-to-yours", when you change
the BytesToString-Function this way:

Function BytesToString(Bytes, Charset)
With CreateObject("ADODB.Stream")
.Open
.Charset = Charset
.Type = 1: .Write Bytes: .Position = 0
.Type = 2
Dim Arr(), i: i = 0
Do Until .EOS
Redim Preserve Arr(i)
Arr(i) = .ReadText(2^18)
i = i + 1
Loop
.Close
BytesToString = Join(Arr,"")
End With
End Function

I've used "normal String-Concats", because it was entirely
sufficient for my usage at the WebServer-side (in Classic-ASP),
because I've never had to deal with Files larger than 2MB or so
on the server-side.

Another (slighter) performance-increase can be accomplished,
when you change the Charset-Encoding I was using in my original
"Main-Script-Code" (at the top) from "utf-8" to "x-ansi".

After these two changes, the two versions should perform
absolutely identical (perhaps mine being a bit faster now,
because it has only to write + later read "half the bytes"
from the intermediate Base64-TextFile.

...

Finally, Code-Constructs like:

With CreateObject("ADODB.Stream")
'...
End With

Are perfectly fine, because (as said) - the instantiation
via CreateObject(...) takes usually only about 10 Micro-Seconds.

And as for "Destroying the instance" - that's actually
"the beauty" of the With-Construct...
Since it destroys the COM-instance in question exactly at the
point of "End With" (no explicit "Set Nothing" is required).

HTH

Olaf

Mayayana

2019-09-22 19:09:09 UTC

"Schmidt" <***@vbRichClient.com> wrote

| > | E.g. what you wrote out as Base64-Text in your DemoCode,
| > | is unnecessarily blown-up two UTF16-LE TextFormat,
| > | which is twice as large as the written file needs to be.
| > |
| >
| > Interesting. It does no such thing on my end.
|
| Please check again, I'm getting (when encoding a 3MB-TestFile), not
| the expected 4MB (Base64 "blows up" the bin-content by factor 1.33),
| but 8MB instead - when using your code...
|

This is intriguing. And annoying. My code was fine until
I tried yours with the utf-8. It then started saving as
unicode, like you said. After some searching online it
seems that ADODB 1) sniffs text and decides what it
should be and 2) maintains some kind of memory as to its
default format.

It seems to solve the problem if I just set Charset.
So with encode, when writing it back to disk, I do like so.:

With ADO
.Open
.Type = 2 'text
.Charset = "windows-1252"
.WriteText s1
.SaveToFile Arg & "-64.txt", 2 'OverWrite
.Close
End With

That works fine and eliminates the problem of utf-8
encoding putting a 3-byte marker into the file. And
it shouldn't matter that the encoding is English since
Base64 will all be characters under 128 and therefore
the same regardless of codepage.

Similarly, when I load the file for decoding, I do the
same:

With ADO
.Open
.Type = 2 'text
.Charset = "windows-1252"
.LoadFromFile Arg
Dim iA, A2()

My resulting speeds for the 25 MB file are .75 and 1.0.
Pretty close to the original. With your updated code I'm
getting .95 and 1.51. Pretty close again. About 1/3 to 1/2
slower, but insignificant in real usage.

So the only real difference is that my code is pretty
and easy to read, while yours is confusing and needs
editing before use. :)

But it is discouraging that ADODB is as undependable
in its mangling of text as FSO is. However, if Charset
can be used to make it behave then that helps.

Schmidt

2019-09-22 20:30:22 UTC

Post by Mayayana
| > | E.g. what you wrote out as Base64-Text in your DemoCode,
| > | is unnecessarily blown-up two UTF16-LE TextFormat,
| > | which is twice as large as the written file needs to be.
| > |
| >
| > Interesting. It does no such thing on my end.
|
| Please check again, I'm getting (when encoding a 3MB-TestFile), not
| the expected 4MB (Base64 "blows up" the bin-content by factor 1.33),
| but 8MB instead - when using your code...
|
This is intriguing. And annoying. My code was fine...

No, it wasn't - because in your prior code you did *not* set
the Charset-Property of the Stream-Object - and thus used
the default (which is "Unicode" aka "UTF16-LE" on Windows).

Post by Mayayana
...until I tried yours with the utf-8. It then started saving
as unicode, like you said.

Arrgh, could you please stick to the truth in a public NewsGroup.
Such "face-saving excuses" are exactly, how "misleading myths"
will take root.

Post by Mayayana
After some searching online it seems that ADODB
1) sniffs text and decides what it should be and
2) maintains some kind of memory as to its default format.

Nope, the ADO.Stream Object does no such thing.
It was (as said above), just using it's default-charset,
which you failed to specify explicitely (in both -
the read- and write-directions).

Post by Mayayana
My resulting speeds for the 25 MB file are .75 and 1.0.
Pretty close to the original. With your updated code I'm
getting .95 and 1.51.

I've done such a test now (after finding a 26MB-file here),
and the speeds were - as expected - identical (using my -
as well as your updated versions).

Post by Mayayana
So the only real difference is that my code is pretty
and easy to read, while yours is confusing and needs
editing before use. :)

No - you can ridicule my coding-style as you like -
but that does not change the fact, that your code (still)
is a perfect example for "spaghetti", seriously.

Really - it's not the coding-style which decides about
"spaghetti" or not, it's whether you applied "modularization",
KISS- and DRY-principles.

On my WebServer I use these VBScript-functions I've posted
via an Include-File (which ASP supports, but the WSH sadly not).

And thus my "UserCode" (the one that I've had to type in my Editor)
is really only this one here (leaving out the Timing-Code):

'--------------------
Dim File, bInp, sB64
File = WScript.Arguments(0)
bInp = ReadBytesFromFile(File)

If MsgBox("Click yes to encode (no to decode)", vbYesNo) = vbYes Then
sB64 = Base64Encode(bInp)
WriteBytesToFile File & "-64.txt", StringToBytes(sB64, "x-ansi")
Else
sB64 = BytesToString(bInp, "x-ansi")
WriteBytesToFile File & "-64.dat", Base64Decode(sB64)
End If
'------------------

Furthermore, my (relatively unnecessary) fix for a bit more speed,
was only applied in a single small Function, which is also
documenting the functionality of its handful of Lines over
its given Function-name (also later in UserCode, by using that Name).

All the other UserCode (spread over dozens of other *.asp-Files),
will now automatically profit from that little speed-up-change,
because it was done on an *include-file*.

Whereas in such non-generic code as yours, you had to actually
make a change in *two* (much harder to find) places, to fix "a bug".

And no other CodeModule of yours will profit from that change
(who knows, where else you used such non-function-encapsulated stuff).

Post by Mayayana
But it is discouraging that ADODB is as undependable
in its mangling of text as FSO is.

As already said, such statements will only create further myths
in the community - nobody really needs or wants that...

Please run the following Single-Line-Script:
MsgBox CreateObject("ADODB.Stream").Charset

It will answer with: "Unicode"
(which is synonymous with "UTF16-LE" - and this will be reported
also on machines with an english locale - I've just checked that here)

Olaf

Mayayana

2019-09-22 19:10:38 UTC

I must say I'm glad we had this debate. I had no
idea that ADODB might pull tricks with text formatting
and might never have noticed if you hadn't had trouble
with my script.

Schmidt

2019-09-22 20:32:21 UTC

Post by Mayayana
I must say I'm glad we had this debate. I had no
idea that ADODB might pull tricks with text formatting
and might never have noticed if you hadn't had trouble
with my script.

Just another cleanup-sweep... before myths evolve...

Please run the following Single-Line-Script:
MsgBox CreateObject("ADODB.Stream").Charset

It will answer with: "Unicode"
(which is synonymous with "UTF16-LE" - and this will be reported
also on machines with an english locale - I've just checked that here)

Olaf

Mayayana

2019-09-22 17:45:11 UTC

"Schmidt" <***@vbRichClient.com> wrote

| With just the two additional Helpers I've mentioned
| (StringToBytes and BytesToString) you'd have all you need,
| to replicate your Script with this short(er) code:
|

I didn't realize until I re-read it that you actually had
posted a working script. Thanks. But I don't see why you think
it's better. All those external functions seem clunky to me.
If I want base-64 code that's portable then it's easy enough
to put mine in a class.

But I think that's really just a matter of personal preference.
I also don't like the method of mashing multiple operations
together in order to have less lines of code. It's hard to read.
Things like With CreateObject(... just encourage bad coding,
while providing no advantage. And it leaves to option to
explicitly dereference.

But I know some people prefer it. There's no accounting
for taste. :)

Speed:

On a 25 MB file my code encodes
in .7 seconds and decodes in .8 seconds. Yours, with
the same file, takes 1.0 and 5.28 seconds respectively.
The lag is mostly in BytesToString. You didn't try this code,
I gather?

I tried taking the loop out of BytesToString, thinking
that maybe ADODB wouldn't be so slow converting its
own stream to a string. But that was horrendous. On
the 25 MB file I finally just killed the process after
a couple of minutes.

JJ

2019-09-21 19:07:08 UTC

Post by Schmidt
Not sure, why there's so much "confusion" about this
(and why one should read binary FileData into a String first).
- passed as Input (as ByteArray) into the enconder
- and returned as Output-(ByteArray) from the decoder
Here's some (pretty symmetrical) Helpers,
which do as they should in my *.asp-Scripts (on Win2008/Win2012 and

Thanks. I guess I failed to realize that ADODB.Stream is much better suited
for handling binary data.

46 Replies
594 Views
Permalink to this page
Disable enhanced parsing

Thread Navigation

JJ 2019-09-16 01:23:16 UTC

Mayayana 2019-09-16 03:03:38 UTC

GS 2019-09-16 03:58:01 UTC

Mayayana 2019-09-16 14:08:58 UTC

GS 2019-09-16 16:32:29 UTC

Mayayana 2019-09-16 17:13:54 UTC

GS 2019-09-17 00:26:50 UTC

Mayayana 2019-09-17 02:08:14 UTC

GS 2019-09-18 02:08:21 UTC

JJ 2019-09-16 11:27:47 UTC

Mayayana 2019-09-16 13:55:31 UTC

Mayayana 2019-09-16 14:32:44 UTC

JJ 2019-09-17 13:15:09 UTC

Mayayana 2019-09-17 14:04:47 UTC

Mayayana 2019-09-18 03:04:39 UTC

JJ 2019-09-18 09:17:48 UTC

Mayayana 2019-09-18 14:12:39 UTC

JJ 2019-09-19 14:23:29 UTC

Mayayana 2019-09-19 15:47:30 UTC

R.Wieser 2019-09-18 09:30:06 UTC

JJ 2019-09-18 10:43:36 UTC

Mayayana 2019-09-18 14:16:04 UTC

R.Wieser 2019-09-18 14:49:56 UTC

Mayayana 2019-09-19 13:29:49 UTC

R.Wieser 2019-09-19 18:10:33 UTC

R.Wieser 2019-09-20 06:04:16 UTC

JJ 2019-09-20 09:06:29 UTC

R.Wieser 2019-09-22 10:15:51 UTC

Mayayana 2019-09-22 13:38:04 UTC

R.Wieser 2019-09-22 14:49:16 UTC

Mayayana 2019-09-16 19:58:35 UTC

Schmidt 2019-09-20 19:44:17 UTC

Mayayana 2019-09-20 21:11:42 UTC

Schmidt 2019-09-20 22:18:44 UTC

Mayayana 2019-09-21 01:26:39 UTC

Mayayana 2019-09-21 03:50:25 UTC

Schmidt 2019-09-22 14:32:26 UTC

Mayayana 2019-09-22 16:45:49 UTC

Schmidt 2019-09-22 17:41:52 UTC

Mayayana 2019-09-22 18:10:25 UTC

Schmidt 2019-09-22 18:46:20 UTC

Mayayana 2019-09-22 19:09:09 UTC

Schmidt 2019-09-22 20:30:22 UTC

Mayayana 2019-09-22 19:10:38 UTC

Schmidt 2019-09-22 20:32:21 UTC

Mayayana 2019-09-22 17:45:11 UTC

JJ 2019-09-21 19:07:08 UTC

about - legalese

Loading...