Discussion:
script to search list of strings in files/directories
(too old to reply)
z***@gmail.com
2016-08-30 07:29:25 UTC
Permalink
General description of the problem : I have a DLL which exports many functions. I want to check which of those exported functions are the functions that are actually in use (in the source code of an application that uses this DLL).

So here is the same question in details :

I have a file (MyFuncs.h) which contains a many lines of the which looks like :

MY_API int function_0(int param_0); // Descriptiom_0
MY_API int function_1(int param_1, int param_1); // Descriptiom_1
MY_API int function_2(); // Descriptiom_0 djdjdjd
MY_API int function_3(void); // Descriptiom_0 s''s's
Comment: The function name is ALWAYS the third word.

I have a directory named C:\MyDir which contains many files of source (*.cpp and *.h).

I need a script that searches for the "all of the functions" in C:\MyDir*.cpp and also in C:\MyDir*.h

Does anybody know how to help? Any scripting language is good, as long as it runs on Windows_7 (WinXP could be nice).

Any advice ?

Thanks
Zmau
Evertjan.
2016-08-30 08:52:36 UTC
Permalink
Post by z***@gmail.com
General description of the problem : I have a DLL which exports many
functions. I want to check which of those exported functions are the
functions that are actually in use (in the source code of an application
that uses this DLL).
MY_API int function_0(int param_0); // Descriptiom_0
MY_API int function_1(int param_1, int param_1); // Descriptiom_1
MY_API int function_2(); // Descriptiom_0 djdjdjd
MY_API int function_3(void); // Descriptiom_0 s''s's
Comment: The function name is ALWAYS the third word.
I have a directory named C:\MyDir which contains many files of source (*.cpp and *.h).
I need a script that searches for the "all of the functions" in
C:\MyDir*.cpp and also in C:\MyDir*.h
Does anybody know how to help? Any scripting language is good,
This NG is about VBScript, ask in other NGs for other languages.
Post by z***@gmail.com
as long as it runs on Windows_7 (WinXP could be nice).
Any advice ?
Using wscript:

- get each file using Scripting.FileSystemObject
- test each line with regex and output to the window
or to a resultfile.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Mayayana
2016-08-31 18:22:58 UTC
Permalink
| Does anybody know how to help? Any scripting language is good, as long as
it runs on Windows_7 (WinXP could be nice).
|
| Any advice ?
|

It's simple, but it's a lot of steps and it sounds like
you want something pre-made. You can get the Windows
Script Host help file to guide you. The object
Scripting.FileSystemObject provides methods to
open a file as a Textstream object, then read that out as
a text string. It all provides methods to access the folder
hierarchy as objects. The Instr method allows you to
check for the presence of "Function-1" in the string you
get from Textstream.ReadAll. (Don't pay any attention
to Evertjan. He's a regular expressions fanatic. He'd
recommend that to get your laundry clean. :)

That should be enough info to start. If you want someone
else to write it, for free, you may have awhile to wait.
Mau Z
2016-08-31 20:05:33 UTC
Permalink
Thanks,
I really wanted something pre-made.
I thought that it is simple for people that know vbScript.
I am not really familiar with vbscripts too much, so I guess it will take me a lot of time.
The original idea was to save time...
I'll wait a little longer and decide whats next.

Thanks
zmau
Evertjan.
2016-08-31 20:16:11 UTC
Permalink
Post by Mau Z
I really wanted something pre-made.
I thought that it is simple for people that know vbScript.
I am not really familiar with vbscripts too much, so I guess it will
take me a lot of time. The original idea was to save time...
Scripts usually will save you time if used far more than once.
Post by Mau Z
I'll wait a little longer and decide whats next.
In the mean time you could teach yourself some regex.

It will safe you a lot of time when automating stuff like you asked,
and even if not, it will strengten your handyness in logical thinking.

Regex is not some form of magic, it is just a fantastic shorthand for string
manipulation and string testing.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Mayayana
2016-08-31 20:39:54 UTC
Permalink
"Evertjan." wrote:

|
| Regex is not some form of magic, it is just a fantastic shorthand for
string
| manipulation and string testing.
|

You're complicating things unnecessarily.
If he does decide to do it he has exact strings
to look for: "function_1" or maybe "function_1(",
or some such.
He's not trying to find patterns. For finding
an exact substring, it doesn't get simpler or
faster than Instr.
Evertjan.
2016-08-31 20:52:03 UTC
Permalink
Post by Mayayana
|
| Regex is not some form of magic,
| it is just a fantastic shorthand for string
| manipulation and string testing.
|
You're complicating things unnecessarily.
Well, who will decide wht is necessary and what not,
and what is complicating and what is streamlining and
what is just fun programming and what is easily understandable?
Post by Mayayana
If he does decide to do it he has exact strings
to look for: "function_1" or maybe "function_1(",
or some such.
He's not trying to find patterns. For finding
an exact substring, it doesn't get simpler or
faster than Instr.
I do not think so.

Regex test() gives you the freedom of decision on any number of small
differences, like case differences.
--
Evertjan.
The Netherlands.
(Please change the x'es to dots in my emailaddress)
Mayayana
2016-09-01 00:54:10 UTC
Permalink
"Evertjan." wrote

| > If he does decide to do it he has exact strings
| > to look for: "function_1" or maybe "function_1(",
| > or some such.
| > He's not trying to find patterns. For finding
| > an exact substring, it doesn't get simpler or
| > faster than Instr.
|
| I do not think so.
|
| Regex test() gives you the freedom of decision on any number of small
| differences, like case differences.
|

x = InStr(1, s, "function_1", 1)

starting point, string to search, search string, case sensitivity
(search the whole string for "function_1", not case sensitive)

There are no other differences in this case. It's a
specific string.

I would note, though, that I've found when doing a
large number of InStr calls on a large string, it's notably
faster to do:

s = UCase(s)
x = InStr(1, s, "function_1", 0)
etc...

The case-insensitive search takes more time. With
a single operation it's not disernible, but with hundreds
of calls using case-sensitive search can speed things up.
Dave "Crash" Dummy
2016-09-01 02:39:48 UTC
Permalink
Post by Mayayana
| > If he does decide to do it he has exact strings
| > to look for: "function_1" or maybe "function_1(",
| > or some such.
| > He's not trying to find patterns. For finding
| > an exact substring, it doesn't get simpler or
| > faster than Instr.
|
| I do not think so.
|
| Regex test() gives you the freedom of decision on any number of small
| differences, like case differences.
|
x = InStr(1, s, "function_1", 1)
starting point, string to search, search string, case sensitivity
(search the whole string for "function_1", not case sensitive)
There are no other differences in this case. It's a
specific string.
I would note, though, that I've found when doing a
large number of InStr calls on a large string, it's notably
s = UCase(s)
x = InStr(1, s, "function_1", 0)
etc...
The case-insensitive search takes more time. With
a single operation it's not disernible, but with hundreds
of calls using case-sensitive search can speed things up.
Shouldn't that be
s = LCase(s)
x = InStr(1, s, "function_1", 0)
--
Crash

Life is short. Eat dessert first.
Mayayana
2016-09-01 14:03:24 UTC
Permalink
"Dave "Crash" Dummy" wrote

| > s = UCase(s)
| > x = InStr(1, s, "function_1", 0)
| > etc...
| >
| > The case-insensitive search takes more time. With
| > a single operation it's not disernible, but with hundreds
| > of calls using case-sensitive search can speed things up.
|
| Shouldn't that be
| s = LCase(s)
| x = InStr(1, s, "function_1", 0)
|

:) Good catch. I should have written that
either LCase or UCase could be used, but I
was being lazy.

| Life is short. Eat dessert first.

Which is the dessert? If I ate a big bowl of
chocolate ice cream I think the next thing
I'd want would be something more plain, like
salad or pasta.
Dave "Crash" Dummy
2016-09-01 15:09:34 UTC
Permalink
Post by Mayayana
| > The case-insensitive search takes more time. With
| > a single operation it's not disernible, but with hundreds
| > of calls using case-sensitive search can speed things up.
I don't know that normalizing the string prior to running the InStr
function is any faster. If the case insensitive option is selected the
function is going to normalize the string before doing the search,
anyway. It may even be slower to normalize the string in a separate
operation before running InStr.
--
Crash

The opposable thumb was an evolutionary plus
until "text" became a verb.
Mayayana
2016-09-01 16:45:07 UTC
Permalink
"Dave "Crash" Dummy" wrote

| > | > The case-insensitive search takes more time. With
| > | > a single operation it's not disernible, but with hundreds
| > | > of calls using case-sensitive search can speed things up.
|
| I don't know that normalizing the string prior to running the InStr
| function is any faster. If the case insensitive option is selected the
| function is going to normalize the string before doing the search,
| anyway. It may even be slower to normalize the string in a separate
| operation before running InStr.
|

Your speculation seems reasonable, but Microsoft
apparently didn't think the same way. I think what
the InStr function probably does is to search numerically.
So a CS search for "A" will look for byte 65. A non-CS
search will look for 65 or 97. Then that will get less
efficient as the string gets longer and each character
adds a dual search. If 65 or 97 is found then look for
66 or 98. If any of those 4 combinations are found then
look for 67 or 99. Etc.

Here's a simple test:

400 iterations of searching a text file, 573 KB.
The nonsense word "AggyDaggy" (to ensure uniqueness)
was added near the end and then a search was run.

-------------------------------------------
Dim FSO, Arg, TS, s1, x1, x2, i, Ret

Arg = WScript.Arguments(0)
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TS = FSO.OpenTextFile(Arg, 1)
s1 = TS.ReadAll
TS.Close
Set TS = Nothing

x1 = Timer
s1 = UCase(s1)
For i = 1 to 400
Ret = InStr(1, s1, "AGGYDAGGY", 0)
Next
x2 = Timer

MsgBox x2 - x1
--------------------------------------

Case sensitive: .234375 seconds
UCase followed by case sensitive: .53125 seconds
non-case sensitive: 3.875 seconds

I've consistently found that two things can greatly
increase the speed of scripts that have to do extensive
work with strings:

1) non-case sensitive string search using UCase.
2) Build strings with an array rather than concatenation.

The latter method uses an array member for each
concatenation. Instead of doing s = s & "more text"
it does A(x) = "more text". Then it uses Join at the
end. I actually got that idea from Matthew Curland's book.
He was one of the original VB6 designers and pointed
out that Join walks the whole array, measuring the
content, then allocates a single string to accomodate
it all. Concatenating must allocate a new string every
time, so adding "more" to a 3 MB ANSI string requires
allocating a new string of 3 MB + 4 bytes. Memory
allocation takes a lot more time than calculations,
and slows as it gets bigger.

In typical usage it doesn't much matter. One InStr call
will be insignificant no matter which way it's done. A half dozen
concatenations don't cost much. But it's not unusual to need
to optimize. Using the two methods above seems like more
work but can actually cut a lot of time out of operations.
Also, Replace is extremely slow, probably because of the
same concatenation problem. It's actually often much faster
to write a complex tokenizing routine than to run a few
Replace operations.
Dave "Crash" Dummy
2016-09-01 17:41:04 UTC
Permalink
| > | > The case-insensitive search takes more time. With | > | > a
single operation it's not disernible, but with hundreds | > | > of
calls using case-sensitive search can speed things up. | | I don't
know that normalizing the string prior to running the InStr |
function is any faster. If the case insensitive option is selected
the | function is going to normalize the string before doing the
search, | anyway. It may even be slower to normalize the string in a
separate | operation before running InStr. |
Your speculation seems reasonable, but Microsoft apparently didn't
think the same way. I think what the InStr function probably does is
to search numerically. So a CS search for "A" will look for byte 65.
A non-CS search will look for 65 or 97. Then that will get less
efficient as the string gets longer and each character adds a dual
search. If 65 or 97 is found then look for 66 or 98. If any of those
4 combinations are found then look for 67 or 99. Etc.
400 iterations of searching a text file, 573 KB. The nonsense word
"AggyDaggy" (to ensure uniqueness) was added near the end and then a
search was run.
------------------------------------------- Dim FSO, Arg, TS, s1, x1,
x2, i, Ret
Arg = WScript.Arguments(0) Set FSO =
CreateObject("Scripting.FileSystemObject") Set TS =
FSO.OpenTextFile(Arg, 1) s1 = TS.ReadAll TS.Close Set TS = Nothing
x1 = Timer s1 = UCase(s1) For i = 1 to 400 Ret = InStr(1, s1,
"AGGYDAGGY", 0) Next x2 = Timer
MsgBox x2 - x1 --------------------------------------
Case sensitive: .234375 seconds UCase
3.875 seconds
I've consistently found that two things can greatly increase the
1) non-case sensitive string search using UCase. 2) Build strings
with an array rather than concatenation.
The latter method uses an array member for each concatenation.
Instead of doing s = s & "more text" it does A(x) = "more text". Then
it uses Join at the end. I actually got that idea from Matthew
Curland's book. He was one of the original VB6 designers and pointed
out that Join walks the whole array, measuring the content, then
allocates a single string to accomodate it all. Concatenating must
allocate a new string every time, so adding "more" to a 3 MB ANSI
string requires allocating a new string of 3 MB + 4 bytes. Memory
allocation takes a lot more time than calculations, and slows as it
gets bigger.
In typical usage it doesn't much matter. One InStr call will be
insignificant no matter which way it's done. A half dozen
concatenations don't cost much. But it's not unusual to need to
optimize. Using the two methods above seems like more work but can
actually cut a lot of time out of operations. Also, Replace is
extremely slow, probably because of the same concatenation problem.
It's actually often much faster to write a complex tokenizing routine
than to run a few Replace operations.
How do you predict the required size of the array? Using "redim
preserve" for each entry seems kind of awkward.
--
Crash

"If the world was perfect, it wouldn't be."
~ Yogi Berra ~
Mayayana
2016-09-01 19:19:02 UTC
Permalink
"Dave "Crash" Dummy" wrote

| How do you predict the required size of the array? Using "redim
| preserve" for each entry seems kind of awkward.

Yes, and it's costly. But not as costly as concatenation.
I just guess based on the job and then write the code
to redim if necessary. It also depends on size. If I expect to
concatenate 10-60 strings I might start with 100. If I expect
to concatenate 1000 or more, I might start with 2000. I use
whatever provides a good chance of not needing to redim
in most cases, while being as small as possible. Then I do
like so:

Dim A1(), UB, i

ReDim A1(100)
UB = 100
i = 0

Do
if i = UB then
UB = UB + 100
ReDim Preserve A1(UB)
end if

.... lots of iterations here....

Loop

UBound is also a costly call, so I'm designing
it to avoid calling UBound. It only needs to
compare UB against i.

An example: Awhile back I wrote a utility to
convert HXS files to CHM. It requires going
through each HTML file (page) in the help file,
looking for particular, faulty links, replacing them
with valid links, then rewriting the page. In inet.hxs
(the IE DOM help file) there were something like
5000 links in 1000 HTML files, so speeding it up just
a little helped a lot. But with each page I just used
an array of UBound 100 to hold the page content
for joining after it was edited. So maybe in a handful of
pages that array had to be redimmed. But for most,
100 strings was more than enough, while allocating
an array of ubound 100 is not a time sucker.
R.Wieser
2016-09-01 07:25:06 UTC
Permalink
Mau Z,
Post by Mau Z
I thought that it is simple for people that know vbScript.
You seem to know C++, don't you ? I'm sure that it should be simple for
you to put some code together which will do what you are looking for.

I mean, C++ allows you to iterate thru a set of files, open them, read those
files line-by-line (or even as a single string-blob) and to do simple
"fgets" parsing (for your .h files) and check for certain substrings (the
function names) using string::find (for the .cpp source files).

Put shortly, with the easyness you could put something together in CPP I do
not really see why you would want to use vbscript -- which you do not even
know.
Post by Mau Z
I really wanted something pre-made.
Than google for something -- and expect to pay for it.
Post by Mau Z
The original idea was to save time...
Yeah, *you* saving time, because you let *us* spend our time on it. :-) :-(

No, I think you misunderstood the purpose of this newsgroup: We are here to
*help* you, not to do unpayed (commercial?) work for you(r boss?).

Regards,
Rudy Wieser
Post by Mau Z
Thanks,
I really wanted something pre-made.
I thought that it is simple for people that know vbScript.
I am not really familiar with vbscripts too much, so I guess it will take me a lot of time.
The original idea was to save time...
I'll wait a little longer and decide whats next.
Thanks
zmau
Mau Z
2016-09-05 14:36:45 UTC
Permalink
Rudy,

Thank you,
Actually the story is simple. I came back to work from a long private vacation, hundreds of things was waiting on my table, and I really did not think too much. Sorry, but that is the truth.
It popped up "Hey, this should be easy to do with a script"
It really seem to me like an simple request.
You are right, I could have done it in CPP (I really did not think of it at the time).
In my defense, I did google the subject, and did not find any thing that is similar or even close (in windows).

Thank you for the criticism, I will try to remember next time.


Thanks again
Zmau
Dave "Crash" Dummy
2016-09-01 18:35:01 UTC
Permalink
Post by z***@gmail.com
General description of the problem : I have a DLL which exports many
functions. I want to check which of those exported functions are the
functions that are actually in use (in the source code of an
application that uses this DLL).
MY_API int function_0(int param_0); // Descriptiom_0 MY_API int
function_1(int param_1, int param_1); // Descriptiom_1 MY_API int
function_2(); // Descriptiom_0 djdjdjd MY_API int
function_3(void); // Descriptiom_0 s''s's Comment: The function
name is ALWAYS the third word.
I have a directory named C:\MyDir which contains many files of source (*.cpp and *.h).
I need a script that searches for the "all of the functions" in
C:\MyDir*.cpp and also in C:\MyDir*.h
Does anybody know how to help? Any scripting language is good, as
long as it runs on Windows_7 (WinXP could be nice).
Any advice ?
Thanks Zmau
Okay, here's a script that should work based on what you're posting. To
run it place the script and your MyFuncs.h file in your C:\MyDir
directory with all your other relevant files. When run, the script creates
a text file named "UsageFile.txt" which lists your .h and .cpp files and
the
functions each uses.

################ FuncUsage.vbs ######################
set fso=CreateObject("Scripting.FileSystemObject")

set MyFuncs=fso.OpenTextFile("MyFuncs.h")
do until MyFuncs.atEndOfStream
line=MyFuncs.readLine
if ubound(split(line)) >= 3 then
if left(split(line)(3),9)="function_" then
fname=split(split(line)(3),"(")(0)
funclist=funclist & " " & fname
end if
end if
loop
funclist=trim(funclist)

set UsageFile=fso.CreateTextFile("UsageFile.txt")

set fldr=fso.getFolder(".")
for each file in fldr.files
n=lcase(file.name)
if n<>"myfuncs.h" AND right(n,2)=".h" OR right(n,4)=".cpp" then
set f=file.OpenAsTextStream
data=f.readAll
f.close
UsageFile.writeLine file.name
for n= 0 to ubound(split(funclist))
ptr= InStr(1,data,split(funclist)(n),1)
if ptr then UsageFile.writeLine split(funclist)(n)
next
UsageFile.writeLine
end if
next
--
Crash

The opposable thumb was an evolutionary plus
until "text" became a verb.
Mau Z
2016-09-05 14:21:58 UTC
Permalink
Dave,

First of all I would like to apologize for not answering for such a long time - basically it was just a long weekend.

The script works. Thank you.
There was a funny issue, but I got around it.
If you are curious for the funny issue then here it is :
In my header file (MyFuncs.h), the third word preceded with three kinds of white spaces :
1) TAB
2) one space
3) two spaces.
Why ? I have no idea. it was not my code to begin with.
Anyway the script caught the third word only in lines that was preceded by two spaces.
Once I noticed it, I fixed the MyFuncs.h and everything was OK.
I just thought that you might be interested.

Thanks again.

Zmau
Dave "Crash" Dummy
2016-09-05 18:43:09 UTC
Permalink
Post by Mau Z
Dave,
First of all I would like to apologize for not answering for such a
long time - basically it was just a long weekend.
The script works. Thank you. There was a funny issue, but I got
In my header file (MyFuncs.h), the third word preceded with three
kinds of white spaces : 1) TAB 2) one space 3) two spaces. Why ? I
have no idea. it was not my code to begin with. Anyway the script
caught the third word only in lines that was preceded by two spaces.
Once I noticed it, I fixed the MyFuncs.h and everything was OK. I
just thought that you might be interested.
I just used the sample you posted and relied on your verbal description.
There's usually several ways to do anything with VBScript. Here is
alternate code for extracting the functions from your header that will
not be affected by placement of the function names in the line.

do until MyFuncs.atEndOfStream
line=MyFuncs.readLine
f=instr(lcase(line),"function_")
if f then
fname=mid(line,f)
fname=split(fname,"(")(0)
funclist=funclist & " " & fname
end if
loop
funclist=trim(funclist)
--
Crash

All this time I thought my analyst was saying I'm psychic...
Dave "Crash" Dummy
2016-09-05 18:54:00 UTC
Permalink
Post by Mau Z
Dave,
First of all I would like to apologize for not answering for such a long time - basically it was just a long weekend.
The script works. Thank you.
There was a funny issue, but I got around it.
1) TAB
2) one space
3) two spaces.
Why ? I have no idea. it was not my code to begin with.
Anyway the script caught the third word only in lines that was preceded by two spaces.
Once I noticed it, I fixed the MyFuncs.h and everything was OK.
I just thought that you might be interested.
I just used the sample you posted and relied on your verbal description.
There's usually several ways to do anything with VBScript. Here is
alternate code for extracting the functions from your header that will
not be affected by placement of the function names in the line.

do until MyFuncs.atEndOfStream
line=MyFuncs.readLine
f=instr(lcase(line),"function_")
if f then
fname=mid(line,f)
fname=split(fname,"(")(0)
funclist=funclist & " " & fname
end if
loop
funclist=trim(funclist)
--
Crash

Today is the first day of the rest of your life,
and there's not a damned thing you can do about it.
Dave "Crash" Dummy
2016-09-05 19:41:39 UTC
Permalink
Here's yet another way:

set fso=CreateObject("Scripting.FileSystemObject")

set MyFuncs=fso.OpenTextFile("MyFuncs.h")
data=lcase(MyFuncs.readAll)
MyFuncs.close

funcs=split(data,"function_")
for x=1 to ubound(funcs)
funclist=funclist & "function_" & left(funcs(x),instr(funcs(x),"("))
next

set UsageFile=fso.CreateTextFile("UsageFile.txt")

set fldr=fso.getFolder(".")
for each file in fldr.files
n=lcase(file.name)
if n<>"myfuncs.h" AND right(n,2)=".h" OR right(n,4)=".cpp" then
set f=file.OpenAsTextStream
data=f.readAll
f.close
UsageFile.writeLine file.name
for n= 0 to ubound(split(funclist,"("))
ptr= InStr(1,data,split(funclist,"(")(n),1)
if ptr then UsageFile.writeLine split(funclist,"(")(n)
next
UsageFile.writeLine
end if
next
UsageFile.close
--
Crash

"I am not young enough to know everything."
~ Oscar Wilde ~
Mau Z
2016-09-05 20:07:24 UTC
Permalink
Thanks
I'll check it tomorrow.
I just want to say that I was really happy :-)
You are so diligent, that I am afraid to ask more questions....

Good night
Mau Z
2016-09-05 20:09:03 UTC
Permalink
Thanks
I'll check it tomorrow.
I just want to say that I was really happy :-)
You are so diligent, that I am afraid to ask more questions....

Good night
Dave "Crash" Dummy
2016-09-06 15:33:47 UTC
Permalink
Thanks I'll check it tomorrow. I just want to say that I was really
happy :-) You are so diligent, that I am afraid to ask more
questions....
I'm not diligent, I'm retired. Writing scripts is a hobby, not a source
of income. Some of the more knowledgeable participants in this group use
their skills to earn a living. They are willing to share their knowledge
and help solve problems, but they should not be expected to provide a
complete finished product for free.
--
Crash

"The unexamined life is not worth living."
~ Socrates ~
Dr J R Stockton
2016-09-07 22:43:33 UTC
Permalink
In microsoft.public.scripting.vbscript message <422e8189-83a1-48a9-a327-
Post by z***@gmail.com
Does anybody know how to help? Any scripting language is good, as long as it runs on Windows_7 (WinXP could be nice).
Any advice ?
MiniTrue, <http://adoxa.altervista.org/minitrue/index.html>, might help.
--
(c) John Stockton, Surrey, UK. ¬@merlyn.demon.co.uk Turnpike v6.05 MIME.
Merlyn Web Site < > - FAQish topics, acronyms, & links.
Loading...