Discussion:
How to economically(!) remove a range of unknown(!) characters from a string
(too old to reply)
R.Wieser
2017-01-03 12:40:05 UTC
Permalink
Hello All,

I need to remove, in a string, a number of characters between two points.
Currently I'm dong that with

sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)

, but as I have to do that quite a few times in a rather large string (1
MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).

So, I'm looking for a more economical (read: faster) way to do the same.

The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it does, but
that one throws everything away before that point, making it useless to me.
:-\

Does anyone have an idea how to do it ?

Regards,
Rudy Wieser
Mayayana
2017-01-03 13:43:28 UTC
Permalink
"R.Wieser" <***@not.available> wrote


| I need to remove, in a string, a number of characters between two points.

I find that string operations often take most of the time
in a long script. Replace is actually extremely slow because
it allocates sloppily. There are two methods I use to speed
it up: String building with arrays and UCase (or LCase) before
doing operations like tokenizing. I got the original idea from
Matthew Curland's book on advanced VB6. He was one of the
developers of MS VB. Curland pointed out that if you have
a string and keep adding one character it will quickly bog
down because each addition requires a new allocation of
string-length + 1 bytes. (Or double that for unicode.) Join,
by contrast, walks the array to calculate the required string
length and then makes a single allocation. So while s = s & x
seems simple, adding small strings to an array and then calling
Join is actually a far more lean operation.

For what you're doing I'd try the following. It seems awkward
but will do less large memory allocations, which is typically
what slows things down. Create an array of UBound string
length. Walk the string, putting each character into the array
unless you want it removed. Finally, do a Join on the array.
You can also walk the string until you get to a non-wanted
character and then take what's left:

i = 1
iCounter = 0
Do While i <= SLen
s = Mid(str1, i, 1)
Select Case s
Case "a"
'retrieve characters since last "a" and add to array.
iCounter = 0
Case else
'do nothing
iCounter = iCounter + 1
End Select
i = i + 1
Loop
' at end, retrieve any characters after last "a"
' Then Join

I use a similar method in VB6 for colorcoding
VBScript, HTML, etc in a RichEdit window. I can
convert plain text to color-coded RTF text very
quickly. I aim for under 250 ms and usually achieve
that for files less than 100 KB. If I use any Replace
calls it greatly increases the time.

A VBS sample is here:
http://www.jsware.net/jsware/scrfiles.php5#jsdeob

I wrote it so that I can get a better idea of what
script is doing on heavily obfuscated or computer-
generated pages. Sometimes I use it to pick out
relevant links.
It uses a similar tokenizing method to optimize the
speed of converting obfuscated javascript to a more
clear format, with colorcoding of variables, strings,
comments and keywords. The output is HTML that
then gets displayed. You can test it by copying some
muck from a typical corporate webpage and pasting it
into the top text window. The de-obfuscated text appears
below and the time taken shows at the bottom of the window.
R.Wieser
2017-01-03 16:21:31 UTC
Permalink
Mayayana,
Create an array of UBound string length. Walk the string, putting
each character into the array unless you want it removed. Finally,
do a Join on the array.
I remember having read of the "put into an array, join afterwards" method,
but forgot all about it. Although I'm not replacing/removing chars but
chunks, the principle stays the same. Lets see if VBS knows about
dynamically-sized arrays ...

I could put the chunks into a dictionary object, afterwards exporting them
to an array and join, and possibly _still_ be faster and needing less memory
:-)

(I also tried to create an ActiveX object to implement a string-overwriting
method, but somehow could not get the BStr or Variant transferred to it "by
reference" :-( )

I just remembered why I didn't like that method : every char costs a Variant
to store. In other words: to store a 1KChar string you need (at least) 12
KByte worth of array space ...

Thanks for the reminder.

Regards,
Rudy Wieser
| I need to remove, in a string, a number of characters between two points.
I find that string operations often take most of the time
in a long script. Replace is actually extremely slow because
it allocates sloppily. There are two methods I use to speed
it up: String building with arrays and UCase (or LCase) before
doing operations like tokenizing. I got the original idea from
Matthew Curland's book on advanced VB6. He was one of the
developers of MS VB. Curland pointed out that if you have
a string and keep adding one character it will quickly bog
down because each addition requires a new allocation of
string-length + 1 bytes. (Or double that for unicode.) Join,
by contrast, walks the array to calculate the required string
length and then makes a single allocation. So while s = s & x
seems simple, adding small strings to an array and then calling
Join is actually a far more lean operation.
For what you're doing I'd try the following. It seems awkward
but will do less large memory allocations, which is typically
what slows things down. Create an array of UBound string
length. Walk the string, putting each character into the array
unless you want it removed. Finally, do a Join on the array.
You can also walk the string until you get to a non-wanted
i = 1
iCounter = 0
Do While i <= SLen
s = Mid(str1, i, 1)
Select Case s
Case "a"
'retrieve characters since last "a" and add to array.
iCounter = 0
Case else
'do nothing
iCounter = iCounter + 1
End Select
i = i + 1
Loop
' at end, retrieve any characters after last "a"
' Then Join
I use a similar method in VB6 for colorcoding
VBScript, HTML, etc in a RichEdit window. I can
convert plain text to color-coded RTF text very
quickly. I aim for under 250 ms and usually achieve
that for files less than 100 KB. If I use any Replace
calls it greatly increases the time.
http://www.jsware.net/jsware/scrfiles.php5#jsdeob
I wrote it so that I can get a better idea of what
script is doing on heavily obfuscated or computer-
generated pages. Sometimes I use it to pick out
relevant links.
It uses a similar tokenizing method to optimize the
speed of converting obfuscated javascript to a more
clear format, with colorcoding of variables, strings,
comments and keywords. The output is HTML that
then gets displayed. You can test it by copying some
muck from a typical corporate webpage and pasting it
into the top text window. The de-obfuscated text appears
below and the time taken shows at the bottom of the window.
Mayayana
2017-01-04 00:33:32 UTC
Permalink
"R.Wieser" <***@not.available> wrote

| I just remembered why I didn't like that method : every char costs a
Variant
| to store. In other words: to store a 1KChar string you need (at least) 12
| KByte worth of array space ...

Yes. Actually I think it's 16 bytes. And that's just to store
the pointer. :) Though I'm not sure how that works in VBS
when it's in an array. It may assume string pointers as array
elements. Either way, it's very fast. Small allocations are cheap.
If you end up trying the other methods and testing, it will
be interesting to hear the results.
R.Wieser
2017-01-04 09:36:37 UTC
Permalink
Mayayana,
Post by Mayayana
Yes. Actually I think it's 16 bytes.
You're right, and something I also remembered. But when double-checking I
somehow mis-added the size of the fields in the structure. :-(
Post by Mayayana
It may assume string pointers as array elements.
I think it has to, as there doesn't seem to be a VT_?? for a 2-byte
(unicode) character available. Which is a shame, as 8 bytes of usable space
is rather enough to save such a char, or even a few of them.

But that means that every char in such an array costs at least 22 bytes (16
bytes for the variant, 6 (8?) bytes for the char itself). :-\
Post by Mayayana
Either way, it's very fast. Small allocations are cheap.
True, and when the array has been pre-declared (doesn't neeed to grow) not
much memory-scrunching needs to be done (just adding of single chars).
Post by Mayayana
If you end up trying the other methods and testing, it will
be interesting to hear the results.
I've been trying a few methods, but splitting the string up into an array
was not one of them: I need to be able to search the string for keywords
(HTML tags to be precise).

Although a triple REPLACE (to normalize EOLs to CRLF combinations) did just
cost 0.01 second over that 1MByte of chars, deleting lots of ranges cost a
_lot_ more.

Interresting enough, running the same function (writing results to file and
reloading that file) providing it diferent arguments (tag names) showed a
wide range of resulting times. From the slowest being 16 ms per
replacement, to the fastest being just 6 ms.

The array method showed pretty-much the same results.

I just re-ran the old/first in-memory deleting method, and although it takes
a bit longer, its not turning out to have the savings I imagined it would:
~3000 replacements taking ~40 seconds, against ~34 seconds using either the
array or file method.

Yep, VBS is in dire need of having a *usable* REPLACE command. :-((

Regards,
Rudy Wieser
Post by Mayayana
| I just remembered why I didn't like that method : every char costs a
Variant
| to store. In other words: to store a 1KChar string you need (at least) 12
| KByte worth of array space ...
Yes. Actually I think it's 16 bytes. And that's just to store
the pointer. :) Though I'm not sure how that works in VBS
when it's in an array. It may assume string pointers as array
elements. Either way, it's very fast. Small allocations are cheap.
If you end up trying the other methods and testing, it will
be interesting to hear the results.
R.Wieser
2017-01-04 11:21:00 UTC
Permalink
I realized that, when complaining about the in-string removing of a range
of chars, I forgot to check what time it cost to just run the search loop --
without removing anything from the string.

And boy, that was an eye-opener: The loop itself took about 2/3 of time I
mentioned. :-( Even when optimizing the caseless tag-finding to first
finding an opening bracket and only than do a caseless tag check the loop
itself still seems to take up half of the involved time.

Thanks Murphys Law (make sure you check everything before coming to
conclusions) :-( :-)

Regards,
Rudy Wieser
Post by R.Wieser
Mayayana,
Post by Mayayana
Yes. Actually I think it's 16 bytes.
You're right, and something I also remembered. But when double-checking I
somehow mis-added the size of the fields in the structure. :-(
Post by Mayayana
It may assume string pointers as array elements.
I think it has to, as there doesn't seem to be a VT_?? for a 2-byte
(unicode) character available. Which is a shame, as 8 bytes of usable space
is rather enough to save such a char, or even a few of them.
But that means that every char in such an array costs at least 22 bytes (16
bytes for the variant, 6 (8?) bytes for the char itself). :-\
Post by Mayayana
Either way, it's very fast. Small allocations are cheap.
True, and when the array has been pre-declared (doesn't neeed to grow) not
much memory-scrunching needs to be done (just adding of single chars).
Post by Mayayana
If you end up trying the other methods and testing, it will
be interesting to hear the results.
I've been trying a few methods, but splitting the string up into an array
was not one of them: I need to be able to search the string for keywords
(HTML tags to be precise).
Although a triple REPLACE (to normalize EOLs to CRLF combinations) did just
cost 0.01 second over that 1MByte of chars, deleting lots of ranges cost a
_lot_ more.
Interresting enough, running the same function (writing results to file and
reloading that file) providing it diferent arguments (tag names) showed a
wide range of resulting times. From the slowest being 16 ms per
replacement, to the fastest being just 6 ms.
The array method showed pretty-much the same results.
I just re-ran the old/first in-memory deleting method, and although it takes
~3000 replacements taking ~40 seconds, against ~34 seconds using either the
array or file method.
Yep, VBS is in dire need of having a *usable* REPLACE command. :-((
Regards,
Rudy Wieser
Post by Mayayana
| I just remembered why I didn't like that method : every char costs a
Variant
| to store. In other words: to store a 1KChar string you need (at
least)
Post by R.Wieser
12
Post by Mayayana
| KByte worth of array space ...
Yes. Actually I think it's 16 bytes. And that's just to store
the pointer. :) Though I'm not sure how that works in VBS
when it's in an array. It may assume string pointers as array
elements. Either way, it's very fast. Small allocations are cheap.
If you end up trying the other methods and testing, it will
be interesting to hear the results.
Mayayana
2017-01-04 13:31:11 UTC
Permalink
"R.Wieser" <***@not.available> wrote

|I realized that, when complaining about the in-string removing of a range
| of chars, I forgot to check what time it cost to just run the search
loop --
| without removing anything from the string.
|
| And boy, that was an eye-opener: The loop itself took about 2/3 of time I
| mentioned. :-( Even when optimizing the caseless tag-finding to first
| finding an opening bracket and only than do a caseless tag check the loop
| itself still seems to take up half of the involved time.
|

InStr? That's surprising.

I assume you know this and it's just a matter of
terminology, but the "array method" I use doesn't
do any searching of arrays. It's part of a tokenizer
routine. The array is only used to build the new string,
in order to avoid concatenation. I've found that
tokenizers, despite being very bulky, are extremely
fast.

ANSI string: VT_BSTRT
For what it's worth. VBS has no access to such things.
The operations are designed to be transparent, so that
all we need to know is that it's a *character* string. Even
the local codepage is transparent, so the idea of bytes
doesn't really apply unless it's an ANSI file, with a non-
multi-byte codepage, handled very carefully. And Windows
has been unicode internally for a long time, anyway.

Without direct access to memory addresses, and with
no access to data types, there's not a lot we can do to
make script more efficient.
R.Wieser
2017-01-04 16:27:16 UTC
Permalink
Mayayana,
Post by Mayayana
InStr? That's surprising.
Yep, to me too. But than again, AFAIK unicode characters can be more than
one unit long, making it hard(er) to do character/string comparisions.
Post by Mayayana
but the "array method" I use doesn't do any searching of arrays.
I know that its just used to keep track of characters (or, in my case,
string fragments), and to join those chars(/string fragments) afterwards.

I just wanted to make clear that to split the string into chars beforehand
would not be working for me, as I need to be able to search the string for
word patterns. So, I search the string for whatever I need, and store the
part ranging from the last to the current position as a fragment into the
array (after which I skip a few chars and start searching again).
Post by Mayayana
ANSI string: VT_BSTRT
Thanks for that. Never seen or heard about it.

Though a quick google (to see what he constants value is) revealed that its
actually VT_BSTRA thats ment to indicate 8-bit character strings, and the
VT_BSTRT is probably comparable to a T_STR, meaning that its a W string on
16-bit, and an A string on 8-bit char systems.

But you caused me another problem: VT_BSTRA seems to have the constant value
Post by Mayayana
Without direct access to memory addresses, and with
no access to data types, there's not a lot we can do to
make script more efficient.
True. As I mentioned, I already tried to directly access the unicode string
thru an home-build ActiveX component. But even when I defined the argument
as [in, out] (in the IDL file) I could change the string all I wanted in the
ActiveX component, but I did not see it change in the script. :-\

I also rewrote the code in Assembly (loading the whole file into memory and
handling it there), which brought the time down to less than 1.5 seconds for
_several_ search-and-replace actions. :-) But its way less flexible. :-\

Nonwithstanding the above, I still want to know about methods to speed-up
VBScripts sting handling though.

Regards,
Rudy Wieser
Post by Mayayana
|I realized that, when complaining about the in-string removing of a range
| of chars, I forgot to check what time it cost to just run the search
loop --
| without removing anything from the string.
|
| And boy, that was an eye-opener: The loop itself took about 2/3 of time I
| mentioned. :-( Even when optimizing the caseless tag-finding to first
| finding an opening bracket and only than do a caseless tag check the loop
| itself still seems to take up half of the involved time.
|
InStr? That's surprising.
I assume you know this and it's just a matter of
terminology, but the "array method" I use doesn't
do any searching of arrays. It's part of a tokenizer
routine. The array is only used to build the new string,
in order to avoid concatenation. I've found that
tokenizers, despite being very bulky, are extremely
fast.
ANSI string: VT_BSTRT
For what it's worth. VBS has no access to such things.
The operations are designed to be transparent, so that
all we need to know is that it's a *character* string. Even
the local codepage is transparent, so the idea of bytes
doesn't really apply unless it's an ANSI file, with a non-
multi-byte codepage, handled very carefully. And Windows
has been unicode internally for a long time, anyway.
Without direct access to memory addresses, and with
no access to data types, there's not a lot we can do to
make script more efficient.
Evertjan.
2017-01-04 17:13:21 UTC
Permalink
Post by R.Wieser
Nonwithstanding the above, I still want to know about methods to speed-up
VBScripts sting handling though.
The Regex replace method, most probably being the same method as used in the
Javascript engine in Cscript/Wscript, in classic ASP and in IE [and Edge??],
as there is only a single JS/VBS engine serving both [is this true?], seems
to be intrinsically much faster than conventional string manipulation with
left(), mid() and right().

It is a pity that VBS has only the mid() function and has lost the mid()
statement found in earlier forms of Basic, because there the swap-function
would have looked like:

temp = mid(txt,n1,1)
mid(txt,n1,1) = mid(txt,n2,1)
mid(txt,n2,1) = temp
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
R.Wieser
2017-01-04 20:53:44 UTC
Permalink
Evertjan,
The Regex replace method .. seems to be intrinsically much faster
than conventional string manipulation with left(), mid() and right().
I believe you. But the old adagio seems to become true:

# You have a problem.
# You think "Thats something a RegExp can do!".
# Now you have _two_ problems.

In my case I tried to use - and expand on - your suggestion, only to come to
the realisation that I do not even know how to read that pattern you posted,
let alone know what it actually does.

Though I think that if I would know how to use RegExp I would not bother to
have it just remove a part of the string, but have it look for the
delimiters too, and that repeatedly over the whole string. :-)
It is a pity that VBS has only the mid() function and has lost the
mid() statement found in earlier forms of Basic, because there the
I do not know why you are mentioning a swap function, but yes, its a pity.
Using it do remove substrings would have been nice. Being able to replace
a substring with another one of an equal length would have been nice too.

... Although, as we are talking about unicode strings here there is no
telling how many bytes a substring actually occupies, and that makes it very
hard to do such a one-on-one character replacement without memory-movment
being needed anyway.

Also, if it still needs a source and destination string (instead of being
able to do it in-place) I still would need to re-create the string
over-and-over again, most likely running into the same problem as with my
origional solution (though it would most likely be faster).

Regards,
Rudy Wieser
Post by R.Wieser
Nonwithstanding the above, I still want to know about methods to speed-up
VBScripts sting handling though.
The Regex replace method, most probably being the same method as used in the
Javascript engine in Cscript/Wscript, in classic ASP and in IE [and Edge??],
as there is only a single JS/VBS engine serving both [is this true?], seems
to be intrinsically much faster than conventional string manipulation with
left(), mid() and right().
It is a pity that VBS has only the mid() function and has lost the mid()
statement found in earlier forms of Basic, because there the swap-function
temp = mid(txt,n1,1)
mid(txt,n1,1) = mid(txt,n2,1)
mid(txt,n2,1) = temp
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Evertjan.
2017-01-04 23:33:52 UTC
Permalink
Post by R.Wieser
In my case I tried to use - and expand on - your suggestion, only to
come to the realisation that I do not even know how to read that pattern
you posted, let alone know what it actually does.
Well, you could ask, or you could try
to learn by experiment and reading the specs.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
R.Wieser
2017-01-05 08:19:54 UTC
Permalink
Evertjan,
you could try to learn by experiment and
reading the specs.
:-) There is an end to my ability to absorb stuff. And it looks like
that RegExp simply isn't sticking to my wetware. And believe me, I've
tried.

I've got several programs that use it, including PHP. I can create
expressions all I want, but quite often the result surprises me.

Regards,
Rudy Wiese
Post by R.Wieser
In my case I tried to use - and expand on - your suggestion, only to
come to the realisation that I do not even know how to read that pattern
you posted, let alone know what it actually does.
Well, you could ask, or you could try
to learn by experiment and reading the specs.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Evertjan.
2017-01-05 09:10:03 UTC
Permalink
Post by R.Wieser
you could try to learn by experiment and
reading the specs.
:-) There is an end to my ability to absorb stuff. And it looks like
that RegExp simply isn't sticking to my wetware. And believe me, I've
tried.
I've got several programs that use it, including PHP.
[in PHP a programme?]
Post by R.Wieser
I can create
expressions all I want, but quite often the result surprises me.
Set myRegExp = New RegExp
myRegExp.Pattern = "^(.{" & RangeBegin & "}).{" & Range & "}"
sText = myRegExp.replace(sText,"$1")
response.write sText
Now:

"^(.{6}).{2}"

means:

^ start matching at the beginning of the string
. match any single character
.{6} match group of 6 of any single character
(.{6}) remember this group for pasting as $1
.{2} match group of 2 of any single character

.replace(sText,"$1")

will do:

replace the first 8 characters [the "match"]
with the first 6 remembered.

so if
sText = "12345678"
the returned string wil be:
"123478"

====================

"^(.{6}).{2}"

could also be written as:

"^(......).."

but then it would be [more] difficult
to insert the numbers 6 and 2 dynamicly
as I did here:

"^(.{" & RangeBegin & "}).{" & Range & "}"

==============
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Evertjan.
2017-01-05 09:34:12 UTC
Permalink
Post by Evertjan.
Post by R.Wieser
you could try to learn by experiment and
reading the specs.
:-) There is an end to my ability to absorb stuff. And it looks like
that RegExp simply isn't sticking to my wetware. And believe me, I've
tried.
I've got several programs that use it, including PHP.
[in PHP a programme?]
[is PHP a programme?]

;-(
Post by Evertjan.
Post by R.Wieser
I can create
expressions all I want, but quite often the result surprises me.
Set myRegExp = New RegExp
myRegExp.Pattern = "^(.{" & RangeBegin & "}).{" & Range & "}"
sText = myRegExp.replace(sText,"$1")
response.write sText
"^(.{6}).{2}"
^ start matching at the beginning of the string
. match any single character
.{6} match group of 6 of any single character
(.{6}) remember this group for pasting as $1
.{2} match group of 2 of any single character
.replace(sText,"$1")
replace the first 8 characters [the "match"]
with the first 6 remembered.
so if
sText = "12345678"
"123478"
Sorry, my mistake:

so if
sText = "12345678ABC"
the returned string wil be:
"123456ABC"
Post by Evertjan.
====================
"^(.{6}).{2}"
"^(......).."
but then it would be [more] difficult
to insert the numbers 6 and 2 dynamicly
"^(.{" & RangeBegin & "}).{" & Range & "}"
==============
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
R.Wieser
2017-01-05 10:50:41 UTC
Permalink
Evertjan,
Post by Evertjan.
[in PHP a programme?]
(It take it that first word was ment to be "is")

Yes, definitily. You _can_ use it stand-alone you know. :-)
Thank you for the explanation (yes, really). I see that I overcomplicated
what I was seeing there. :-\

I was thinking of how to put a string in the middle of that (a HTML remark
mentioning the fact that something was removed), and how that would be
non-trivial (to me), as it would become part of the expression ...

Regards,
Rudy Wieser
Post by Evertjan.
Post by R.Wieser
you could try to learn by experiment and
reading the specs.
:-) There is an end to my ability to absorb stuff. And it looks like
that RegExp simply isn't sticking to my wetware. And believe me, I've
tried.
I've got several programs that use it, including PHP.
[in PHP a programme?]
Post by R.Wieser
I can create
expressions all I want, but quite often the result surprises me.
Set myRegExp = New RegExp
myRegExp.Pattern = "^(.{" & RangeBegin & "}).{" & Range & "}"
sText = myRegExp.replace(sText,"$1")
response.write sText
"^(.{6}).{2}"
^ start matching at the beginning of the string
. match any single character
.{6} match group of 6 of any single character
(.{6}) remember this group for pasting as $1
.{2} match group of 2 of any single character
.replace(sText,"$1")
replace the first 8 characters [the "match"]
with the first 6 remembered.
so if
sText = "12345678"
"123478"
====================
"^(.{6}).{2}"
"^(......).."
but then it would be [more] difficult
to insert the numbers 6 and 2 dynamicly
"^(.{" & RangeBegin & "}).{" & Range & "}"
==============
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Mayayana
2017-01-05 14:44:06 UTC
Permalink
"R.Wieser" <***@not.available> wrote

| > The Regex replace method .. seems to be intrinsically much faster
| > than conventional string manipulation with left(), mid() and right().
|
| I believe you.

That's unfortunate. I was hoping you'd spend
2 or 3 days of grunt work, thoroughly testing
all options, then reporting the results so that
well all know. :)

Personally, though, I consider life too short
for regex. Even if you found a slight improvement
over tokenizing I wouldn't turn to regex in the
future. If necessary I do what you were doing:
Write the speed functionality in compiled code,
to be called by VBS. But in general I don't find
that VBS is too slow for anything I need it to
do, as long as I optimize it.
Evertjan.
2017-01-05 15:50:25 UTC
Permalink
Post by Mayayana
| > The Regex replace method .. seems to be intrinsically much faster
| > than conventional string manipulation with left(), mid() and right().
|
| I believe you.
That's unfortunate. I was hoping you'd spend
2 or 3 days of grunt work, thoroughly testing
all options, then reporting the results so that
well all know. :)
Personally, though, I consider life too short
for regex. Even if you found a slight improvement
over tokenizing I wouldn't turn to regex in the
Write the speed functionality in compiled code,
to be called by VBS. But in general I don't find
that VBS is too slow for anything I need it to
do, as long as I optimize it.
I have lived for a long long time,
and see a posting from myself of
04 Aug 2003 23:15:39
giving a regex suggestion.

Regex is not a new language,
it is just a method for doing
string testing and manipulation efficiently.

But if you concider programming just a means to an end,
and not a joy in itself, well perhaps,
but not as an answer to the subject line "economically(!)".

I do not concider life to short for learning
and hope to learn for what is left of it.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
R.Wieser
2017-01-05 16:29:01 UTC
Permalink
Evertjan,
Post by Evertjan.
But if you concider programming just a means to an end,
Not really. There is absolutily _heaps_ I still want to try and do, even if
I just limit myself to programming*, enough to last me this lifetime and
than some.

To take up something like RegExp I must have a direct necessity for it, or a
feeling that I will enjoy it above what I already have waiting.

*Going out to enjoy a pint, reading, electronics and microcontrollers,
friends and viewing a movie or two all want some of my time too. :-)

Regards,
Rudy Wieser
Post by Evertjan.
Post by Mayayana
| > The Regex replace method .. seems to be intrinsically much faster
| > than conventional string manipulation with left(), mid() and right().
|
| I believe you.
That's unfortunate. I was hoping you'd spend
2 or 3 days of grunt work, thoroughly testing
all options, then reporting the results so that
well all know. :)
Personally, though, I consider life too short
for regex. Even if you found a slight improvement
over tokenizing I wouldn't turn to regex in the
Write the speed functionality in compiled code,
to be called by VBS. But in general I don't find
that VBS is too slow for anything I need it to
do, as long as I optimize it.
I have lived for a long long time,
and see a posting from myself of
04 Aug 2003 23:15:39
giving a regex suggestion.
Regex is not a new language,
it is just a method for doing
string testing and manipulation efficiently.
But if you concider programming just a means to an end,
and not a joy in itself, well perhaps,
but not as an answer to the subject line "economically(!)".
I do not concider life to short for learning
and hope to learn for what is left of it.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Mayayana
2017-01-05 16:50:30 UTC
Permalink
"Evertjan." <***@inter.nl.net> wrote

| > Personally, though, I consider life too short
| > for regex.

| I have lived for a long long time,
| and see a posting from myself of
| 04 Aug 2003 23:15:39
| giving a regex suggestion.
|

I'm not at all surprised. Regex is your answer
to most things. I've never understood the religious
appeal of regex, but maybe it's the unusual design.
It puts off some people and tickles others.

| But if you concider programming just a means to an end,
| and not a joy in itself, well perhaps,

I do enjoy it. That's why no regex for me. :)

| I do not concider life to short for learning
| and hope to learn for what is left of it.
|

Perhaps you haven't lived as long as you think,
if you think that way. It's instant. Since time does
not exist per se, no matter how long you live,
there's no duration. You can think back on your
death bed about former memories, but that's just
thoughts. Whether you live 10 years or 100, there's
still no time when the end comes, and it's always
now in between. The rest is concepts. Which is why
elderly people so often remark that "it all went by
so quickly!".

But I didn't mean to be so philosophical. "Life's
too short for...." is just an American expression.
It's used to humorously describe unpleasant,
distasteful, or tedious things. Life's too short to
watch Dancing with the Stars. Life's too short to
use a computer in console mode. Life's too short
for regrets. Life's too short for regex. .... Those
are just a few obvious examples of how the
expression might be used. Perhaps life is too short
to explain humor to Evertjan, but I'll try, anyway. :)
Dave "Crash" Dummy
2017-01-05 03:57:02 UTC
Permalink
Post by Evertjan.
It is a pity that VBS has only the mid() function and has lost the mid()
statement found in earlier forms of Basic, because there the swap-function
temp = mid(txt,n1,1)
mid(txt,n1,1) = mid(txt,n2,1)
mid(txt,n2,1) = temp
You can still come close:

temp1=mid(txt,n1,1)
temp2=mid(txt,n2,1)
txt=left(txt,n1-1) & temp2 & mid(txt,n1+1)
txt=left(txt,n2-1) & temp1 & mid(txt,n2+1)
--
Crash

All this time I thought my analyst was saying I'm psychic...
Evertjan.
2017-01-05 08:54:01 UTC
Permalink
Post by Dave "Crash" Dummy
Post by Evertjan.
It is a pity that VBS has only the mid() function and has lost the
mid() statement found in earlier forms of Basic, because there the
temp = mid(txt,n1,1)
mid(txt,n1,1) = mid(txt,n2,1)
mid(txt,n2,1) = temp
temp1=mid(txt,n1,1)
temp2=mid(txt,n2,1)
txt=left(txt,n1-1) & temp2 & mid(txt,n1+1)
txt=left(txt,n2-1) & temp1 & mid(txt,n2+1)
Not that simple,
as when n1 or n2 is 1,
you will get an error.

left(txt,0)
should, imho, return "",
but it fails to do that in VBS.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Dave "Crash" Dummy
2017-01-05 09:51:27 UTC
Permalink
Post by Evertjan.
Post by Dave "Crash" Dummy
Post by Evertjan.
It is a pity that VBS has only the mid() function and has lost the
mid() statement found in earlier forms of Basic, because there the
temp = mid(txt,n1,1)
mid(txt,n1,1) = mid(txt,n2,1)
mid(txt,n2,1) = temp
temp1=mid(txt,n1,1)
temp2=mid(txt,n2,1)
txt=left(txt,n1-1) & temp2 & mid(txt,n1+1)
txt=left(txt,n2-1) & temp1 & mid(txt,n2+1)
Not that simple,
as when n1 or n2 is 1,
you will get an error.
left(txt,0)
should, imho, return "",
but it fails to do that in VBS.
I don't get an error. I get a zero length string for left(txt,0),
but no error. The zero defines the length of the sample,
not its location. The swap still works. Tried it with this:

txt="abcdefghi"
n1=1
n2=5
temp1=mid(txt,n1,1)
temp2=mid(txt,n2,1)
msgbox left(txt,0)
txt=left(txt,n1-1) & temp2 & mid(txt,n1+1)
txt=left(txt,n2-1) & temp1 & mid(txt,n2+1)
msgbox txt & vbCRLF & "abcdefghi"
--
Crash

"Celibacy is the worst form of self-abuse."
~ Peter De Vries ~
Evertjan.
2017-01-05 11:17:31 UTC
Permalink
Post by Dave "Crash" Dummy
Post by Evertjan.
Post by Dave "Crash" Dummy
Post by Evertjan.
It is a pity that VBS has only the mid() function and has lost the
mid() statement found in earlier forms of Basic, because there the
temp = mid(txt,n1,1)
mid(txt,n1,1) = mid(txt,n2,1)
mid(txt,n2,1) = temp
temp1=mid(txt,n1,1)
temp2=mid(txt,n2,1)
txt=left(txt,n1-1) & temp2 & mid(txt,n1+1)
txt=left(txt,n2-1) & temp1 & mid(txt,n2+1)
Not that simple,
as when n1 or n2 is 1,
you will get an error.
left(txt,0)
should, imho, return "",
but it fails to do that in VBS.
I don't get an error. I get a zero length string for left(txt,0),
but no error. The zero defines the length of the sample,
txt="abcdefghi"
n1=1
n2=5
temp1=mid(txt,n1,1)
temp2=mid(txt,n2,1)
msgbox left(txt,0)
txt=left(txt,n1-1) & temp2 & mid(txt,n1+1)
txt=left(txt,n2-1) & temp1 & mid(txt,n2+1)
msgbox txt & vbCRLF & "abcdefghi"
Okay!

<%
tx = "abcdefghi"
old = tx : response.write "old: " & old & "<br><br>"
n1 = 1 : temp1 = mid(tx,n1,1)
n2 = 5 : temp2 = mid(tx,n2,1)

tx = rp1(tx, n1, temp2)
tx = rp1(tx, n2, temp1)

response.write "string replace: " & tx & "<br><br>"

tx = old

tx = rp2(tx, n1, temp2)
tx = rp2(tx, n2, temp1)

response.write "regex replace: " & tx & "<br>"

function rp1(str,beg,letter)
rp1 = left(str,beg-1) & letter & mid(str,beg+1)
end function

function rp2(str,beg,letter)
Set myRegExp = New RegExp
myRegExp.Pattern = "^(.{" & beg-1 & "})."
rp2 = myRegExp.replace(str,"$1"&letter)
end function

%>
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Mayayana
2017-01-05 14:38:06 UTC
Permalink
"Dave "Crash" Dummy" <***@invalid.invalid> wrote

| > temp = mid(txt,n1,1)
| > mid(txt,n1,1) = mid(txt,n2,1)
| > mid(txt,n2,1) = temp
|
| You can still come close:
|

It's not hard to work around the Mid statement,
but it's very inefficient, requiring the allocation of
new strings. With large strings it gets *very* slow.
The Mid statement treats the string like an array,
replacing existing characters with new ones. So you
don't need to do s = Left(s1, x) & s2 & Right(s1, y)
You just paste s2 into the existing string at offset
x. It's a direct write with no allocations.

I use that method in VB6 to rebuild strings with
amazing speed. I use a SafeArray structure to
point at the memory used by the string and treat
it as an array. Then I tokenize the string numerically,
walking the array and dealing with characters as
numbers. At the same time I allocate a string big
enough to hold my rebuild and just Mid into it. If
VBS could do that it could help a lot to avoid the
clumsy process of string snipping and pasting with
variants.
Dave "Crash" Dummy
2017-01-05 18:11:33 UTC
Permalink
Post by Mayayana
| > temp = mid(txt,n1,1)
| > mid(txt,n1,1) = mid(txt,n2,1)
| > mid(txt,n2,1) = temp
|
|
It's not hard to work around the Mid statement,
but it's very inefficient, requiring the allocation of
new strings. With large strings it gets *very* slow.
The Mid statement treats the string like an array,
replacing existing characters with new ones. So you
don't need to do s = Left(s1, x) & s2 & Right(s1, y)
You just paste s2 into the existing string at offset
x. It's a direct write with no allocations.
I thought the point was that you could not use mid() to paste a string
into the string in VBScript.
Post by Mayayana
I use that method in VB6 to rebuild strings with
amazing speed. I use a SafeArray structure to
point at the memory used by the string and treat
it as an array. Then I tokenize the string numerically,
walking the array and dealing with characters as
numbers. At the same time I allocate a string big
enough to hold my rebuild and just Mid into it. If
VBS could do that it could help a lot to avoid the
clumsy process of string snipping and pasting with
variants.
--
Crash

"The future ain't what it used to be."
~ Yogi Berra ~
Evertjan.
2017-01-05 21:42:23 UTC
Permalink
Post by Dave "Crash" Dummy
I thought the point was that you could not use mid() to paste a string
into the string in VBScript.
There is no mid()-statement in VBS,
only a mid()-function.

There was mid()-statement in earlier Basic's,
possibly introduced by the Basic a young man called
William Gates wrote for DOS and for Central Data.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Dave "Crash" Dummy
2017-01-05 22:35:16 UTC
Permalink
Post by Evertjan.
Post by Dave "Crash" Dummy
I thought the point was that you could not use mid() to paste a string
into the string in VBScript.
There is no mid()-statement in VBS,
only a mid()-function.
There was mid()-statement in earlier Basic's,
possibly introduced by the Basic a young man called
William Gates wrote for DOS and for Central Data.
Basic precludes Bill Gates by a decade. I first learned Basic in the
60's, but I don't remember if that generation had a "mid" statement or
function. It was "string code" as opposed to "object oriented code."
--
Crash

English is not my native tongue; I'm an American.
Evertjan.
2017-01-05 23:46:03 UTC
Permalink
Post by Dave "Crash" Dummy
Post by Evertjan.
There was mid()-statement in earlier Basic's,
possibly introduced by the Basic a young man called
William Gates wrote for DOS and for Central Data.
Basic precludes Bill Gates by a decade. I first learned Basic in the
60's, but I don't remember if that generation had a "mid" statement or
function. It was "string code" as opposed to "object oriented code."
I doubt that it 'precludes', it rather predates.

However, this boy William introduced some syntax we still use today,
when writing 'Altair Basic', 'DOS Basic' and 'Central Data Basic',
and I vaguely think that were the string-manipulators left$, mid$, right$.

==========================

You can call him on the number in Albuquerque, NM below:

"8K RESERVED WORDS INCLUDE ALL THOSE ABOVE, AND IN ADDITION ASC AND ATN CHR
$ CLOAD CONT COS CSAVE DEF EXP FN FRE INP LEFT$ LEN LOG MID$ NULL ON OR NOT
OUT PEEK POKE POS RIGHT SPC( STR$ TAN VAL WAIT Remember, in the 4K version
of BASIC variable names are only a letter"

"If any immediate problems with MITS software are encountered, feel free to
give us a call at (505), 265-7553. The Software Department is at Ext. 3; and
the joint authors of the ALTAIR BASIC Interpreter, Bill Gates, Paul Allen
and Monte Davidoff, will be glad to assist you."

<http://www.altair32.com/pdf/Altair_8800_BASIC_Reference_Manual_1975.PDF>
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Dave "Crash" Dummy
2017-01-06 02:03:38 UTC
Permalink
Post by Evertjan.
Post by Dave "Crash" Dummy
Post by Evertjan.
There was mid()-statement in earlier Basic's,
possibly introduced by the Basic a young man called
William Gates wrote for DOS and for Central Data.
Basic precludes Bill Gates by a decade. I first learned Basic in the
60's, but I don't remember if that generation had a "mid" statement or
function. It was "string code" as opposed to "object oriented code."
I doubt that it 'precludes', it rather predates.
You're right. I'm getting sloppy. Thank you.
--
Crash

"Celibacy is the worst form of self-abuse."
~ Peter De Vries ~
Mayayana
2017-01-05 23:41:19 UTC
Permalink
"Dave "Crash" Dummy" <***@invalid.invalid> wrote

| > You just paste s2 into the existing string at offset
| > x. It's a direct write with no allocations.
|
| I thought the point was that you could not use mid() to paste a string
| into the string in VBScript.
|

Yes, that's what I'm saying. In VB6 there's a
Mid statement that allows one to paste into a
string. and it really is a paste. It's very fast,
treating the string as an array and replacing
any number of characters directly. One
can't use the Mid *function* to do anything
but retrieve part of a string.

Mid statement:
Mid(string, start, [optional length to replace]) = replacementString

s = "abcdefghijklmnop"
s1 = "apple"
Mid(s, 3, 4) = s1

Result: s = "abapplghijklmnop"
Ulrich Möller
2017-01-03 14:03:41 UTC
Permalink
Hi Rudy,
Post by R.Wieser
I need to remove, in a string, a number of characters between two points.
Currently I'm dong that with
sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)
, but as I have to do that quite a few times in a rather large string (1
MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).
So, I'm looking for a more economical (read: faster) way to do the same.
The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it does, but
that one throws everything away before that point, making it useless to me.
:-\
maybe this is interesting for you:
http://www.codeguru.com/csharp/.net/net_asp/tutorials/article.php/c19367/VBScript-String-Concatenation-And-Why-It-Should-Be-Avoided-Like-The-Plague.htm

Depending on your exact problem the array method is a more preferable
method instead of simple string concatenations.

Ulrich
Ulrich Möller
2017-01-03 14:26:10 UTC
Permalink
test snippet for using ado stream for concatenating strings:

set buffer = WScript.CreateObject("ADODB.Stream")
buffer.Open
buffer.type = 2
buffer.WriteText("String1")
buffer.WriteText("String2")
buffer.position = 0
strTemp = buffer.readText()
buffer.close
set buffer = Nothing

Ulrich
R.Wieser
2017-01-03 16:33:57 UTC
Permalink
Ulrich,
That method also passed my mind as a possibility. I did however not assume
it would be faster than an in-memory method, but I could be wrong in that
...

Goddamn, clumsy hacks like that makes me think back to my time with DOS and
text filtering using batch files/commands. :-\

Regards,
Rudy Wieser
Post by Ulrich Möller
set buffer = WScript.CreateObject("ADODB.Stream")
buffer.Open
buffer.type = 2
buffer.WriteText("String1")
buffer.WriteText("String2")
buffer.position = 0
strTemp = buffer.readText()
buffer.close
set buffer = Nothing
Ulrich
R.Wieser
2017-01-03 16:26:22 UTC
Permalink
Ulrich,
... the array method is a more preferable method instead of
simple string concatenations.
:-) Mayayana just reminded me of the same. Thanks for the suggestion.

Regards,
Rudy Wieser
Hi Rudy,
Post by R.Wieser
I need to remove, in a string, a number of characters between two points.
Currently I'm dong that with
sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)
, but as I have to do that quite a few times in a rather large string (1
MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).
So, I'm looking for a more economical (read: faster) way to do the same.
The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it does, but
that one throws everything away before that point, making it useless to me.
:-\
http://www.codeguru.com/csharp/.net/net_asp/tutorials/article.php/c19367/VBS
cript-String-Concatenation-And-Why-It-Should-Be-Avoided-Like-The-Plague.htm
Depending on your exact problem the array method is a more preferable
method instead of simple string concatenations.
Ulrich
Evertjan.
2017-01-03 14:06:59 UTC
Permalink
Post by R.Wieser
I need to remove, in a string, a number of characters between two
points. Currently I'm dong that with
sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)
, but as I have to do that quite a few times in a rather large string (1
MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).
So, I'm looking for a more economical (read: faster) way to do the same.
The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it does,
but that one throws everything away before that point, making it useless
to me.
:-\
Does anyone have an idea how to do it ?
RegEx is your friend:

sText = "qwerty12345"
RangeBegin = 2
RangeEnd = 6
Range = RangeEnd - RangeBegin

Set myRegExp = New RegExp
myRegExp.Pattern = "^(.{" & RangeBegin & "}).{" & Range & "}"
sText = myRegExp.replace(sText,"$1")
response.write sText ' qw12345
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
R.Wieser
2017-01-03 16:29:48 UTC
Permalink
Evertjan
...
Post by Evertjan.
sText = myRegExp.replace(sText,"$1")
I might be wrong, but is that line above not also a simple concatenation
which is, as its VBS, subject to the same sloppy allocations (as Mayayana
put it) ?

Hmmm .. have to test it I guess

Thanks for the suggestion.

Regards,
Rudy Wieser
Post by Evertjan.
Post by R.Wieser
I need to remove, in a string, a number of characters between two
points. Currently I'm dong that with
sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)
, but as I have to do that quite a few times in a rather large string (1
MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).
So, I'm looking for a more economical (read: faster) way to do the same.
The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it does,
but that one throws everything away before that point, making it useless
to me.
:-\
Does anyone have an idea how to do it ?
sText = "qwerty12345"
RangeBegin = 2
RangeEnd = 6
Range = RangeEnd - RangeBegin
Set myRegExp = New RegExp
myRegExp.Pattern = "^(.{" & RangeBegin & "}).{" & Range & "}"
sText = myRegExp.replace(sText,"$1")
response.write sText ' qw12345
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Dave "Crash" Dummy
2017-01-03 16:17:08 UTC
Permalink
Post by R.Wieser
Hello All,
I need to remove, in a string, a number of characters between two
points. Currently I'm dong that with
sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)
, but as I have to do that quite a few times in a rather large string
(1 MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).
So, I'm looking for a more economical (read: faster) way to do the same.
The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it
does, but that one throws everything away before that point, making
it useless to me. :-\
Does anyone have an idea how to do it ?
I'm confused, as usual. Is the string of characters that you want to
remove multiple occurrences of the same string? If so, you can simply
use it as the delimiter in an array then rejoin the elements without the
delimiter:

xString=mid(sText,RangeBegin,RangeEnd-RangeBegin)
sArray=split(sText,xString)
newText=join(sArray,"")

If you are talking about removing multiple unique strings, I'll leave it
to the wordier experts here. :-)
--
Crash

"Celibacy is the worst form of self-abuse."
~ Peter De Vries ~
R.Wieser
2017-01-03 16:39:05 UTC
Permalink
Dave,
Post by Dave "Crash" Dummy
I'm confused, as usual.
Thats not something I can help you with I'm afraid. :-)
Post by Dave "Crash" Dummy
Is the string of characters that you want to remove multiple
occurrences of the same string?
Unknown (I have no control over what exactly is in the string), but could
be. If not, I could have used a simple "replace". :-)
Post by Dave "Crash" Dummy
If so, you can simply use it as the delimiter in an array then
Would that not also be subject to the same "sloppy allocation" as the method
I posted ?

Regards,
Rudy Wieser
Post by Dave "Crash" Dummy
Post by R.Wieser
Hello All,
I need to remove, in a string, a number of characters between two
points. Currently I'm dong that with
sText = left(sText, RangeBegin-1) & mid(sText, RangeEnd+1)
, but as I have to do that quite a few times in a rather large string
(1 MByte) the whole thing gets mighty slow (probably because of the
again-and-again recreation of the string).
So, I'm looking for a more economical (read: faster) way to do the same.
The problem is that VBScript doesn't do simple overwriting, and the
"replace" command does not accept a start position -- or rather it
does, but that one throws everything away before that point, making
it useless to me. :-\
Does anyone have an idea how to do it ?
I'm confused, as usual. Is the string of characters that you want to
remove multiple occurrences of the same string? If so, you can simply
use it as the delimiter in an array then rejoin the elements without the
xString=mid(sText,RangeBegin,RangeEnd-RangeBegin)
sArray=split(sText,xString)
newText=join(sArray,"")
If you are talking about removing multiple unique strings, I'll leave it
to the wordier experts here. :-)
--
Crash
"Celibacy is the worst form of self-abuse."
~ Peter De Vries ~
Loading...