Jump to content
Tuts 4 You

Recurive Scan


starzboy

Recommended Posts

Posted

Hello guys,

i have written a little recursive scan function to list all files with their sizes inside a directory as well as its sub directories. The function seems to work fine but the only thing is that it is very slow.

I have tried using threads but its not better.

Actully i am listing the files to find duplicates, with same name and size.

Can someone provide with a better approach.

Can anythign be done with drivers to accelerate things.

cya

starz

Posted

What do you mean by slow ?

The code I'm using for getting all files from a folder (and its subfolders) isn't exactly highspeed but it does several hundred files per second on a halfway decent machine.

I doubt threading will make it any faster as the real limitation will be the file system access.

We're talking about FindFirstFile/FindNextFile etc. ?

Posted

And it would be good to know what programming language are you using...

Posted

i coded a simple index tool for my ftp server (also with filesize info). it scans 20000 files in "some seconds" . just "FindFirstFile" "FindNextFile"....

it just depends on your harddrive speed. maybe its not defragmentet??

Posted (edited)

Hey,

thanks a ton for replying, well as usual is masm.

hmm..

yea im using Findfirstfile and findnextfile.. i might as well post the src here.

Its a bit messy but i have the better version on my laptop...

GarbageScan	proc hWnd :HWND	_ifind:
invoke lstrcpy, addr sPath, addr GPath
invoke lstrcat, addr GPath, chr$("*.*")
invoke FindFirstFile, addr GPath , addr Gfnd
mov dword ptr ds:[ffind],eax pushad
invoke lstrlen,addr GPath
mov ebx,eax
xor ecx,ecx
mov esi,offset GPath
_nxtpath:
mov al,byte ptr ds:[esi]
cmp al,05Ch
jne _noslash
inc ecx
_noslash:
dec ebx
inc esi
cmp ebx,00h
jne _nxtpath
imul ecx,04h
mov dword ptr ds:[fdist],ecx
popad mov edi,offset fstack
add edi,dword ptr ds:[fdist]
mov eax,dword ptr ds:[ffind]
mov dword ptr ds:[edi],eax
_chkfile:
cmp byte ptr ds:[Gfnd.cFileName],2Eh
je _fnext invoke lstrcpy,addr BPath,addr sPath
invoke lstrcat,addr BPath,addr Gfnd.cFileName
invoke GetFileAttributes,addr BPath
cmp eax,10h
jne _nodir invoke lstrcpy,addr GPath,addr sPath
invoke lstrcat, addr GPath, addr Gfnd.cFileName
invoke lstrcat, addr GPath, chr$("\")
; Keep track of folders scanned
inc dword ptr ds:[ffoldcount]
jmp _ifind _nodir:
;invoke SetDlgItemText,hEdit,5001,addr sPath
invoke SetDlgItemText,hWnd,1008,addr sPath
invoke CreateFile,addr BPath,GENERIC_READ+GENERIC_WRITE,FILE_SHARE_WRITE,NULL,OPEN_EXISTING,FILE_ATTRIB
UTE_NORMAL+FILE_ATTRIBUTE_HIDDEN,0
push eax
invoke GetFileSize,eax,00h invoke wsprintf,addr BPath,chr$(" %s - %d bytes."),addr Gfnd.cFileName,eax
invoke List,hWnd,addr BPath pop eax
invoke CloseHandle,eax
; Keep track of files scanned
inc dword ptr ds:[ffilecount] _fnext:
invoke FindNextFile, dword ptr ds:[ffind], addr Gfnd
cmp eax, 00h
jne _chkfile; Check for Base , if base Findnext = 0 , we have reached end :)
invoke lstrlen,addr sPath
cmp eax,dword ptr ds:[forgsize]
je _fend; Close old Handle
invoke FindClose,dword ptr ds:[ffind]
; Decrease last "\" and goto the previous folder.
invoke lstrcpy,addr GPath, addr sPath
invoke lstrlen,addr GPath
mov ebx,eax
mov esi,offset GPath
add esi,ebx
dec esi
mov byte ptr ds:[esi],00h
_nxtbkfnd:
mov al ,byte ptr ds:[esi]
cmp al,05Ch
je _sslash
mov byte ptr ds:[esi],00h
dec esi
jmp _nxtbkfnd
_sslash:; Make sPath = GPath , after the restoring the previous Folder :)
invoke lstrcpy,addr sPath, addr GPath; When size of current dir and original dir matches , load handle of original folder and do findnext , if find next fails = end scan.
invoke lstrlen,addr sPath
cmp eax,dword ptr ds:[forgsize]
jne _notyet
nop
pushad
invoke lstrlen,addr GPath
mov ebx,eax
xor ecx,ecx
mov esi,offset GPath
_nxtpath2:
mov al,byte ptr ds:[esi]
cmp al,05Ch
jne _noslash2
inc ecx
_noslash2:
dec ebx
inc esi
cmp ebx,00h
jne _nxtpath2
imul ecx,04h
mov dword ptr ds:[fdist],ecx
popad
; final checks if it really has no more files to scan for :)
mov edi,offset fstack
add edi,dword ptr ds:[fdist]
mov eax,dword ptr ds:[edi]
mov dword ptr ds:[ffind],eax
invoke FindNextFile, dword ptr ds:[ffind], addr Gfnd
cmp eax, 00h
jne _chkfile
nop
jmp _fend
_notyet:; Load Handle of previous folder and do find next file.
pushad
invoke lstrlen,addr GPath
mov ebx,eax
xor ecx,ecx
mov esi,offset GPath
_nxtpath1:
mov al,byte ptr ds:[esi]
cmp al,05Ch
jne _noslash1
inc ecx
_noslash1:
dec ebx
inc esi
cmp ebx,00h
jne _nxtpath1
imul ecx,04h
mov dword ptr ds:[fdist],ecx
popad mov edi,offset fstack
add edi,dword ptr ds:[fdist]
mov eax,dword ptr ds:[edi]
mov dword ptr ds:[ffind],eax
jmp _fnext _fend:
invoke FindClose,dword ptr ds:[ffind]; State the Files and Folders Scanned
mov eax, dword ptr ds:[ffilecount]
mov ecx, dword ptr ds:[ffoldcount]
invoke wsprintf,addr BPath,chr$(" %d Files and %d Folders Scanned."),eax,ecx
invoke SetDlgItemText,hWnd,1008,addr BPath xor eax,eax
ret
GarbageScan EndP

Binary file attached below for debugging....

The better version is much faster.

Thanks again

customsearch.rar

Edited by starzboy
Posted

I think the problem are the window operations like SetDlgItemText. Better delete this command or create a thread which posts only every 500milliseconds the actual file.

Also a listview with many entries has a low performance. Here you can use a Virtual Listview instead. or just save all files into memory first and make output after scanning.

Posted

hmmm

i just noticed that it is createfile which is slowing the whole thing.

Without createfile it works perfectly.

I wonder what would an alternative be ?

what are the methods to open a file on the fly.

Posted

you dont need to use CreateFile to geht the filesize. You can get the filesize from the WIN32_FIND_DATA structure which is filled by FindFirstFile/FindNextFile.

Posted

Hi starzboy

Look in masm32lib folder

filesize proc lpszFileName:DWORD	LOCAL wfd :WIN32_FIND_DATA	invoke FindFirstFile,lpszFileName,ADDR wfd
.if eax == INVALID_HANDLE_VALUE
mov eax, -1
jmp fsEnd
.endif invoke FindClose, eax mov eax, wfd.nFileSizeLow fsEnd: retfilesize endp
Posted

what if i need to get more information from the file, i mean if i need to open the file and seek some data, i mean if i seek some bytes ?

Juss curious

Posted

I have made a similar program in C# and its multithreaded. Multi-threading in .NET is a pain although now the program works damn fast scanning gigs of data with a filter in seconds and if you make a search for the second time on the same drive the speed increases.

Anyways why do you want to seek? do want to extract PE info or something?

Posted

hi GEEK

how about file CRc :)

Posted

on 2nd time its cached in windows. content scan will be always slow. next multithreading will only work with drives that have good command que.

and geek i dont know how you can scan gigs of data in seconds.

take for example 8gb iso and look for string in it.

iso isnt cached

hdd has 100mb/s read speed using filesystem,not raw read.

then 8gb scan in best case scenario,defragmented file etc will take 80 seconds.

Posted

This is how I do Crc32 checks in C++:

DWORD dwCrc32Table[ 256 ] = {
0x00000000L, 0x77073096L, 0xEE0E612CL, 0x990951BAL, 0x076DC419L, 0x706AF48FL, 0xE963A535L, 0x9E6495A3L,
0x0EDB8832L, 0x79DCB8A4L, 0xE0D5E91EL, 0x97D2D988L, 0x09B64C2BL, 0x7EB17CBDL, 0xE7B82D07L, 0x90BF1D91L,
0x1DB71064L, 0x6AB020F2L, 0xF3B97148L, 0x84BE41DEL, 0x1ADAD47DL, 0x6DDDE4EBL, 0xF4D4B551L, 0x83D385C7L,
0x136C9856L, 0x646BA8C0L, 0xFD62F97AL, 0x8A65C9ECL, 0x14015C4FL, 0x63066CD9L, 0xFA0F3D63L, 0x8D080DF5L,
0x3B6E20C8L, 0x4C69105EL, 0xD56041E4L, 0xA2677172L, 0x3C03E4D1L, 0x4B04D447L, 0xD20D85FDL, 0xA50AB56BL,
0x35B5A8FAL, 0x42B2986CL, 0xDBBBC9D6L, 0xACBCF940L, 0x32D86CE3L, 0x45DF5C75L, 0xDCD60DCFL, 0xABD13D59L,
0x26D930ACL, 0x51DE003AL, 0xC8D75180L, 0xBFD06116L, 0x21B4F4B5L, 0x56B3C423L, 0xCFBA9599L, 0xB8BDA50FL,
0x2802B89EL, 0x5F058808L, 0xC60CD9B2L, 0xB10BE924L, 0x2F6F7C87L, 0x58684C11L, 0xC1611DABL, 0xB6662D3DL,
0x76DC4190L, 0x01DB7106L, 0x98D220BCL, 0xEFD5102AL, 0x71B18589L, 0x06B6B51FL, 0x9FBFE4A5L, 0xE8B8D433L,
0x7807C9A2L, 0x0F00F934L, 0x9609A88EL, 0xE10E9818L, 0x7F6A0DBBL, 0x086D3D2DL, 0x91646C97L, 0xE6635C01L,
0x6B6B51F4L, 0x1C6C6162L, 0x856530D8L, 0xF262004EL, 0x6C0695EDL, 0x1B01A57BL, 0x8208F4C1L, 0xF50FC457L,
0x65B0D9C6L, 0x12B7E950L, 0x8BBEB8EAL, 0xFCB9887CL, 0x62DD1DDFL, 0x15DA2D49L, 0x8CD37CF3L, 0xFBD44C65L,
0x4DB26158L, 0x3AB551CEL, 0xA3BC0074L, 0xD4BB30E2L, 0x4ADFA541L, 0x3DD895D7L, 0xA4D1C46DL, 0xD3D6F4FBL,
0x4369E96AL, 0x346ED9FCL, 0xAD678846L, 0xDA60B8D0L, 0x44042D73L, 0x33031DE5L, 0xAA0A4C5FL, 0xDD0D7CC9L,
0x5005713CL, 0x270241AAL, 0xBE0B1010L, 0xC90C2086L, 0x5768B525L, 0x206F85B3L, 0xB966D409L, 0xCE61E49FL,
0x5EDEF90EL, 0x29D9C998L, 0xB0D09822L, 0xC7D7A8B4L, 0x59B33D17L, 0x2EB40D81L, 0xB7BD5C3BL, 0xC0BA6CADL,
0xEDB88320L, 0x9ABFB3B6L, 0x03B6E20CL, 0x74B1D29AL, 0xEAD54739L, 0x9DD277AFL, 0x04DB2615L, 0x73DC1683L,
0xE3630B12L, 0x94643B84L, 0x0D6D6A3EL, 0x7A6A5AA8L, 0xE40ECF0BL, 0x9309FF9DL, 0x0A00AE27L, 0x7D079EB1L,
0xF00F9344L, 0x8708A3D2L, 0x1E01F268L, 0x6906C2FEL, 0xF762575DL, 0x806567CBL, 0x196C3671L, 0x6E6B06E7L,
0xFED41B76L, 0x89D32BE0L, 0x10DA7A5AL, 0x67DD4ACCL, 0xF9B9DF6FL, 0x8EBEEFF9L, 0x17B7BE43L, 0x60B08ED5L,
0xD6D6A3E8L, 0xA1D1937EL, 0x38D8C2C4L, 0x4FDFF252L, 0xD1BB67F1L, 0xA6BC5767L, 0x3FB506DDL, 0x48B2364BL,
0xD80D2BDAL, 0xAF0A1B4CL, 0x36034AF6L, 0x41047A60L, 0xDF60EFC3L, 0xA867DF55L, 0x316E8EEFL, 0x4669BE79L,
0xCB61B38CL, 0xBC66831AL, 0x256FD2A0L, 0x5268E236L, 0xCC0C7795L, 0xBB0B4703L, 0x220216B9L, 0x5505262FL,
0xC5BA3BBEL, 0xB2BD0B28L, 0x2BB45A92L, 0x5CB36A04L, 0xC2D7FFA7L, 0xB5D0CF31L, 0x2CD99E8BL, 0x5BDEAE1DL,
0x9B64C2B0L, 0xEC63F226L, 0x756AA39CL, 0x026D930AL, 0x9C0906A9L, 0xEB0E363FL, 0x72076785L, 0x05005713L,
0x95BF4A82L, 0xE2B87A14L, 0x7BB12BAEL, 0x0CB61B38L, 0x92D28E9BL, 0xE5D5BE0DL, 0x7CDCEFB7L, 0x0BDBDF21L,
0x86D3D2D4L, 0xF1D4E242L, 0x68DDB3F8L, 0x1FDA836EL, 0x81BE16CDL, 0xF6B9265BL, 0x6FB077E1L, 0x18B74777L,
0x88085AE6L, 0xFF0F6A70L, 0x66063BCAL, 0x11010B5CL, 0x8F659EFFL, 0xF862AE69L, 0x616BFFD3L, 0x166CCF45L,
0xA00AE278L, 0xD70DD2EEL, 0x4E048354L, 0x3903B3C2L, 0xA7672661L, 0xD06016F7L, 0x4969474DL, 0x3E6E77DBL,
0xAED16A4AL, 0xD9D65ADCL, 0x40DF0B66L, 0x37D83BF0L, 0xA9BCAE53L, 0xDEBB9EC5L, 0x47B2CF7FL, 0x30B5FFE9L,
0xBDBDF21CL, 0xCABAC28AL, 0x53B39330L, 0x24B4A3A6L, 0xBAD03605L, 0xCDD70693L, 0x54DE5729L, 0x23D967BFL,
0xB3667A2EL, 0xC4614AB8L, 0x5D681B02L, 0x2A6F2B94L, 0xB40BBE37L, 0xC30C8EA1L, 0x5A05DF1BL, 0x2D02EF8DL
};
DWORD CPatch::_CalculateCRC()
{
if( this->m_hFileHandle == INVALID_HANDLE_VALUE )
return 0; DWORD dwFileSize = GetFileSize( this->m_hFileHandle, 0 );
if( dwFileSize == 0 || dwFileSize == INVALID_FILE_SIZE )
return 0; DWORD dwMappedFile = (DWORD)m_lpMappedFile;
DWORD dwCrc32 = 0xFFFFFFFF;
_asm
{
push esi
push edi
mov ecx, dwCrc32
lea edi, dwCrc32Table
mov esi, dword ptr ds:[dwMappedFile]
mov edx, esi
add edx, dwFileSize
loopcrc32:
xor eax, eax
mov bl, byte ptr [esi]
mov al, cl
inc esi
xor al, bl
shr ecx, 8
mov ebx, [edi+eax*4]
xor ecx, ebx
cmp edx, esi
jne loopcrc32
pop edi
pop esi
mov dwCrc32, ecx
not dwCrc32
}
return dwCrc32;
}

It's inline ASM so I figure you might be able to convert it as needed easily. This is taking from a small patching class I made based on some of y0da's old work. Some quick info about things that aren't shown here:

this->m_hFileHandle this is the handle returned from CreateFile when the file is opened.

this->m_lpMappedFile this is the LPVOID pointer returned from MapViewOfFile on the file.

So, a quick rundown of this:

- CreateFile: Open the file.

- CreateFileMapping: Create a mapping of the opened file.

- MapViewOfFile: Map the file into memory.

Posted

Atomos... bro read my first post.

Getting CRc by Filemapview is no big deal...

Posted
Atomos... bro read my first post.

Getting CRc by Filemapview is no big deal...

I was refering to your post that mentioned crc's. As for your post above with the code you made, why are you opening the file in the first place for most of the info you are getting? The structure used with FindFirstFile/FindNextFile has most of the info you are obtaining by opening the file.

typedef struct _WIN32_FIND_DATA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
DWORD nFileSizeHigh;
DWORD nFileSizeLow;
DWORD dwReserved0;
DWORD dwReserved1;
TCHAR cFileName[MAX_PATH];
TCHAR cAlternateFileName[14];
} WIN32_FIND_DATA,
*PWIN32_FIND_DATA,
*LPWIN32_FIND_DATA;

You have the file size already inside this as well as the file attributes. No need to open the file and use more calls and such to obtain that.

The size is calculated via:

(nFileSizeHigh * (0xFFFFFFFF+1)) + nFileSizeLow

Posted

@Atomos

i didnt mean to offend you in the first place.

bro, i use the recursive scan to get the files, and i need to get the file CRC that is why i have opened them, can we end up with something faster ?

thankyou

Posted

nope. if you want to be always sure that file crc is ok and file content havent changed then only way is to calc it again for every file.

long and slow process.

Posted

The slow part isn't parsing the file list but the opening. Unless you find a way to get some sort of filesystem crc (maybe native NTFS checksum or something) without loading every file into memory and calcing the CRC, it's not gonna get any faster...

Posted (edited)

there is no such thing

due write one byte and what ntfs will calc whole crc for 8gb file?

and even using sector crc will be not faster,due you read whole sector, so its like reading whole file.

Edited by human
  • 3 weeks later...
Posted

ok guys, i am stuck once again.

How do i get the file creation time and others via WIN32_FIND_DATA

I seem to be so stuck !

Posted

You can use GetFileTime function...

http://msdn.microsoft.com/en-us/library/ms724320(VS.85).aspx

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...