starzboy Posted November 26, 2008 Posted November 26, 2008 Hello guys,i have written a little recursive scan function to list all files with their sizes inside a directory as well as its sub directories. The function seems to work fine but the only thing is that it is very slow.I have tried using threads but its not better.Actully i am listing the files to find duplicates, with same name and size.Can someone provide with a better approach.Can anythign be done with drivers to accelerate things.cyastarz
Killboy Posted November 26, 2008 Posted November 26, 2008 What do you mean by slow ?The code I'm using for getting all files from a folder (and its subfolders) isn't exactly highspeed but it does several hundred files per second on a halfway decent machine.I doubt threading will make it any faster as the real limitation will be the file system access.We're talking about FindFirstFile/FindNextFile etc. ?
Nacho_dj Posted November 26, 2008 Posted November 26, 2008 And it would be good to know what programming language are you using...
diablo2oo2 Posted November 26, 2008 Posted November 26, 2008 i coded a simple index tool for my ftp server (also with filesize info). it scans 20000 files in "some seconds" . just "FindFirstFile" "FindNextFile"....it just depends on your harddrive speed. maybe its not defragmentet??
starzboy Posted November 26, 2008 Author Posted November 26, 2008 (edited) Hey,thanks a ton for replying, well as usual is masm.hmm..yea im using Findfirstfile and findnextfile.. i might as well post the src here.Its a bit messy but i have the better version on my laptop...GarbageScan proc hWnd :HWND _ifind: invoke lstrcpy, addr sPath, addr GPath invoke lstrcat, addr GPath, chr$("*.*") invoke FindFirstFile, addr GPath , addr Gfnd mov dword ptr ds:[ffind],eax pushad invoke lstrlen,addr GPath mov ebx,eax xor ecx,ecx mov esi,offset GPath _nxtpath: mov al,byte ptr ds:[esi] cmp al,05Ch jne _noslash inc ecx _noslash: dec ebx inc esi cmp ebx,00h jne _nxtpath imul ecx,04h mov dword ptr ds:[fdist],ecx popad mov edi,offset fstack add edi,dword ptr ds:[fdist] mov eax,dword ptr ds:[ffind] mov dword ptr ds:[edi],eax _chkfile: cmp byte ptr ds:[Gfnd.cFileName],2Eh je _fnext invoke lstrcpy,addr BPath,addr sPath invoke lstrcat,addr BPath,addr Gfnd.cFileName invoke GetFileAttributes,addr BPath cmp eax,10h jne _nodir invoke lstrcpy,addr GPath,addr sPath invoke lstrcat, addr GPath, addr Gfnd.cFileName invoke lstrcat, addr GPath, chr$("\"); Keep track of folders scanned inc dword ptr ds:[ffoldcount] jmp _ifind _nodir:;invoke SetDlgItemText,hEdit,5001,addr sPath invoke SetDlgItemText,hWnd,1008,addr sPath invoke CreateFile,addr BPath,GENERIC_READ+GENERIC_WRITE,FILE_SHARE_WRITE,NULL,OPEN_EXISTING,FILE_ATTRIBUTE_NORMAL+FILE_ATTRIBUTE_HIDDEN,0 push eax invoke GetFileSize,eax,00h invoke wsprintf,addr BPath,chr$(" %s - %d bytes."),addr Gfnd.cFileName,eax invoke List,hWnd,addr BPath pop eax invoke CloseHandle,eax; Keep track of files scanned inc dword ptr ds:[ffilecount] _fnext: invoke FindNextFile, dword ptr ds:[ffind], addr Gfnd cmp eax, 00h jne _chkfile; Check for Base , if base Findnext = 0 , we have reached end :) invoke lstrlen,addr sPath cmp eax,dword ptr ds:[forgsize] je _fend; Close old Handle invoke FindClose,dword ptr ds:[ffind]; Decrease last "\" and goto the previous folder. invoke lstrcpy,addr GPath, addr sPath invoke lstrlen,addr GPath mov ebx,eax mov esi,offset GPath add esi,ebx dec esi mov byte ptr ds:[esi],00h _nxtbkfnd: mov al ,byte ptr ds:[esi] cmp al,05Ch je _sslash mov byte ptr ds:[esi],00h dec esi jmp _nxtbkfnd _sslash:; Make sPath = GPath , after the restoring the previous Folder :) invoke lstrcpy,addr sPath, addr GPath; When size of current dir and original dir matches , load handle of original folder and do findnext , if find next fails = end scan. invoke lstrlen,addr sPath cmp eax,dword ptr ds:[forgsize] jne _notyet nop pushad invoke lstrlen,addr GPath mov ebx,eax xor ecx,ecx mov esi,offset GPath _nxtpath2: mov al,byte ptr ds:[esi] cmp al,05Ch jne _noslash2 inc ecx _noslash2: dec ebx inc esi cmp ebx,00h jne _nxtpath2 imul ecx,04h mov dword ptr ds:[fdist],ecx popad; final checks if it really has no more files to scan for :) mov edi,offset fstack add edi,dword ptr ds:[fdist] mov eax,dword ptr ds:[edi] mov dword ptr ds:[ffind],eax invoke FindNextFile, dword ptr ds:[ffind], addr Gfnd cmp eax, 00h jne _chkfile nop jmp _fend _notyet:; Load Handle of previous folder and do find next file. pushad invoke lstrlen,addr GPath mov ebx,eax xor ecx,ecx mov esi,offset GPath _nxtpath1: mov al,byte ptr ds:[esi] cmp al,05Ch jne _noslash1 inc ecx _noslash1: dec ebx inc esi cmp ebx,00h jne _nxtpath1 imul ecx,04h mov dword ptr ds:[fdist],ecx popad mov edi,offset fstack add edi,dword ptr ds:[fdist] mov eax,dword ptr ds:[edi] mov dword ptr ds:[ffind],eax jmp _fnext _fend: invoke FindClose,dword ptr ds:[ffind]; State the Files and Folders Scanned mov eax, dword ptr ds:[ffilecount] mov ecx, dword ptr ds:[ffoldcount] invoke wsprintf,addr BPath,chr$(" %d Files and %d Folders Scanned."),eax,ecx invoke SetDlgItemText,hWnd,1008,addr BPath xor eax,eax retGarbageScan EndPBinary file attached below for debugging....The better version is much faster.Thanks againcustomsearch.rar Edited November 26, 2008 by starzboy
diablo2oo2 Posted November 26, 2008 Posted November 26, 2008 I think the problem are the window operations like SetDlgItemText. Better delete this command or create a thread which posts only every 500milliseconds the actual file.Also a listview with many entries has a low performance. Here you can use a Virtual Listview instead. or just save all files into memory first and make output after scanning.
starzboy Posted November 26, 2008 Author Posted November 26, 2008 hmmmi just noticed that it is createfile which is slowing the whole thing.Without createfile it works perfectly.I wonder what would an alternative be ?what are the methods to open a file on the fly.
diablo2oo2 Posted November 26, 2008 Posted November 26, 2008 you dont need to use CreateFile to geht the filesize. You can get the filesize from the WIN32_FIND_DATA structure which is filled by FindFirstFile/FindNextFile.
ragdog Posted November 26, 2008 Posted November 26, 2008 Hi starzboyLook in masm32lib folderfilesize proc lpszFileName:DWORD LOCAL wfd :WIN32_FIND_DATA invoke FindFirstFile,lpszFileName,ADDR wfd .if eax == INVALID_HANDLE_VALUE mov eax, -1 jmp fsEnd .endif invoke FindClose, eax mov eax, wfd.nFileSizeLow fsEnd: retfilesize endp
starzboy Posted November 26, 2008 Author Posted November 26, 2008 what if i need to get more information from the file, i mean if i need to open the file and seek some data, i mean if i seek some bytes ?Juss curious
GEEK Posted November 27, 2008 Posted November 27, 2008 I have made a similar program in C# and its multithreaded. Multi-threading in .NET is a pain although now the program works damn fast scanning gigs of data with a filter in seconds and if you make a search for the second time on the same drive the speed increases.Anyways why do you want to seek? do want to extract PE info or something?
human Posted November 27, 2008 Posted November 27, 2008 on 2nd time its cached in windows. content scan will be always slow. next multithreading will only work with drives that have good command que.and geek i dont know how you can scan gigs of data in seconds.take for example 8gb iso and look for string in it.iso isnt cachedhdd has 100mb/s read speed using filesystem,not raw read.then 8gb scan in best case scenario,defragmented file etc will take 80 seconds.
atom0s Posted November 28, 2008 Posted November 28, 2008 This is how I do Crc32 checks in C++:DWORD dwCrc32Table[ 256 ] = { 0x00000000L, 0x77073096L, 0xEE0E612CL, 0x990951BAL, 0x076DC419L, 0x706AF48FL, 0xE963A535L, 0x9E6495A3L, 0x0EDB8832L, 0x79DCB8A4L, 0xE0D5E91EL, 0x97D2D988L, 0x09B64C2BL, 0x7EB17CBDL, 0xE7B82D07L, 0x90BF1D91L, 0x1DB71064L, 0x6AB020F2L, 0xF3B97148L, 0x84BE41DEL, 0x1ADAD47DL, 0x6DDDE4EBL, 0xF4D4B551L, 0x83D385C7L, 0x136C9856L, 0x646BA8C0L, 0xFD62F97AL, 0x8A65C9ECL, 0x14015C4FL, 0x63066CD9L, 0xFA0F3D63L, 0x8D080DF5L, 0x3B6E20C8L, 0x4C69105EL, 0xD56041E4L, 0xA2677172L, 0x3C03E4D1L, 0x4B04D447L, 0xD20D85FDL, 0xA50AB56BL, 0x35B5A8FAL, 0x42B2986CL, 0xDBBBC9D6L, 0xACBCF940L, 0x32D86CE3L, 0x45DF5C75L, 0xDCD60DCFL, 0xABD13D59L, 0x26D930ACL, 0x51DE003AL, 0xC8D75180L, 0xBFD06116L, 0x21B4F4B5L, 0x56B3C423L, 0xCFBA9599L, 0xB8BDA50FL, 0x2802B89EL, 0x5F058808L, 0xC60CD9B2L, 0xB10BE924L, 0x2F6F7C87L, 0x58684C11L, 0xC1611DABL, 0xB6662D3DL, 0x76DC4190L, 0x01DB7106L, 0x98D220BCL, 0xEFD5102AL, 0x71B18589L, 0x06B6B51FL, 0x9FBFE4A5L, 0xE8B8D433L, 0x7807C9A2L, 0x0F00F934L, 0x9609A88EL, 0xE10E9818L, 0x7F6A0DBBL, 0x086D3D2DL, 0x91646C97L, 0xE6635C01L, 0x6B6B51F4L, 0x1C6C6162L, 0x856530D8L, 0xF262004EL, 0x6C0695EDL, 0x1B01A57BL, 0x8208F4C1L, 0xF50FC457L, 0x65B0D9C6L, 0x12B7E950L, 0x8BBEB8EAL, 0xFCB9887CL, 0x62DD1DDFL, 0x15DA2D49L, 0x8CD37CF3L, 0xFBD44C65L, 0x4DB26158L, 0x3AB551CEL, 0xA3BC0074L, 0xD4BB30E2L, 0x4ADFA541L, 0x3DD895D7L, 0xA4D1C46DL, 0xD3D6F4FBL, 0x4369E96AL, 0x346ED9FCL, 0xAD678846L, 0xDA60B8D0L, 0x44042D73L, 0x33031DE5L, 0xAA0A4C5FL, 0xDD0D7CC9L, 0x5005713CL, 0x270241AAL, 0xBE0B1010L, 0xC90C2086L, 0x5768B525L, 0x206F85B3L, 0xB966D409L, 0xCE61E49FL, 0x5EDEF90EL, 0x29D9C998L, 0xB0D09822L, 0xC7D7A8B4L, 0x59B33D17L, 0x2EB40D81L, 0xB7BD5C3BL, 0xC0BA6CADL, 0xEDB88320L, 0x9ABFB3B6L, 0x03B6E20CL, 0x74B1D29AL, 0xEAD54739L, 0x9DD277AFL, 0x04DB2615L, 0x73DC1683L, 0xE3630B12L, 0x94643B84L, 0x0D6D6A3EL, 0x7A6A5AA8L, 0xE40ECF0BL, 0x9309FF9DL, 0x0A00AE27L, 0x7D079EB1L, 0xF00F9344L, 0x8708A3D2L, 0x1E01F268L, 0x6906C2FEL, 0xF762575DL, 0x806567CBL, 0x196C3671L, 0x6E6B06E7L, 0xFED41B76L, 0x89D32BE0L, 0x10DA7A5AL, 0x67DD4ACCL, 0xF9B9DF6FL, 0x8EBEEFF9L, 0x17B7BE43L, 0x60B08ED5L, 0xD6D6A3E8L, 0xA1D1937EL, 0x38D8C2C4L, 0x4FDFF252L, 0xD1BB67F1L, 0xA6BC5767L, 0x3FB506DDL, 0x48B2364BL, 0xD80D2BDAL, 0xAF0A1B4CL, 0x36034AF6L, 0x41047A60L, 0xDF60EFC3L, 0xA867DF55L, 0x316E8EEFL, 0x4669BE79L, 0xCB61B38CL, 0xBC66831AL, 0x256FD2A0L, 0x5268E236L, 0xCC0C7795L, 0xBB0B4703L, 0x220216B9L, 0x5505262FL, 0xC5BA3BBEL, 0xB2BD0B28L, 0x2BB45A92L, 0x5CB36A04L, 0xC2D7FFA7L, 0xB5D0CF31L, 0x2CD99E8BL, 0x5BDEAE1DL, 0x9B64C2B0L, 0xEC63F226L, 0x756AA39CL, 0x026D930AL, 0x9C0906A9L, 0xEB0E363FL, 0x72076785L, 0x05005713L, 0x95BF4A82L, 0xE2B87A14L, 0x7BB12BAEL, 0x0CB61B38L, 0x92D28E9BL, 0xE5D5BE0DL, 0x7CDCEFB7L, 0x0BDBDF21L, 0x86D3D2D4L, 0xF1D4E242L, 0x68DDB3F8L, 0x1FDA836EL, 0x81BE16CDL, 0xF6B9265BL, 0x6FB077E1L, 0x18B74777L, 0x88085AE6L, 0xFF0F6A70L, 0x66063BCAL, 0x11010B5CL, 0x8F659EFFL, 0xF862AE69L, 0x616BFFD3L, 0x166CCF45L, 0xA00AE278L, 0xD70DD2EEL, 0x4E048354L, 0x3903B3C2L, 0xA7672661L, 0xD06016F7L, 0x4969474DL, 0x3E6E77DBL, 0xAED16A4AL, 0xD9D65ADCL, 0x40DF0B66L, 0x37D83BF0L, 0xA9BCAE53L, 0xDEBB9EC5L, 0x47B2CF7FL, 0x30B5FFE9L, 0xBDBDF21CL, 0xCABAC28AL, 0x53B39330L, 0x24B4A3A6L, 0xBAD03605L, 0xCDD70693L, 0x54DE5729L, 0x23D967BFL, 0xB3667A2EL, 0xC4614AB8L, 0x5D681B02L, 0x2A6F2B94L, 0xB40BBE37L, 0xC30C8EA1L, 0x5A05DF1BL, 0x2D02EF8DL};DWORD CPatch::_CalculateCRC(){ if( this->m_hFileHandle == INVALID_HANDLE_VALUE ) return 0; DWORD dwFileSize = GetFileSize( this->m_hFileHandle, 0 ); if( dwFileSize == 0 || dwFileSize == INVALID_FILE_SIZE ) return 0; DWORD dwMappedFile = (DWORD)m_lpMappedFile; DWORD dwCrc32 = 0xFFFFFFFF; _asm { push esi push edi mov ecx, dwCrc32 lea edi, dwCrc32Table mov esi, dword ptr ds:[dwMappedFile] mov edx, esi add edx, dwFileSizeloopcrc32: xor eax, eax mov bl, byte ptr [esi] mov al, cl inc esi xor al, bl shr ecx, 8 mov ebx, [edi+eax*4] xor ecx, ebx cmp edx, esi jne loopcrc32 pop edi pop esi mov dwCrc32, ecx not dwCrc32 } return dwCrc32;}It's inline ASM so I figure you might be able to convert it as needed easily. This is taking from a small patching class I made based on some of y0da's old work. Some quick info about things that aren't shown here:this->m_hFileHandle this is the handle returned from CreateFile when the file is opened.this->m_lpMappedFile this is the LPVOID pointer returned from MapViewOfFile on the file.So, a quick rundown of this: - CreateFile: Open the file. - CreateFileMapping: Create a mapping of the opened file. - MapViewOfFile: Map the file into memory.
starzboy Posted November 28, 2008 Author Posted November 28, 2008 Atomos... bro read my first post.Getting CRc by Filemapview is no big deal...
atom0s Posted November 28, 2008 Posted November 28, 2008 Atomos... bro read my first post.Getting CRc by Filemapview is no big deal...I was refering to your post that mentioned crc's. As for your post above with the code you made, why are you opening the file in the first place for most of the info you are getting? The structure used with FindFirstFile/FindNextFile has most of the info you are obtaining by opening the file.typedef struct _WIN32_FIND_DATA { DWORD dwFileAttributes; FILETIME ftCreationTime; FILETIME ftLastAccessTime; FILETIME ftLastWriteTime; DWORD nFileSizeHigh; DWORD nFileSizeLow; DWORD dwReserved0; DWORD dwReserved1; TCHAR cFileName[MAX_PATH]; TCHAR cAlternateFileName[14];} WIN32_FIND_DATA, *PWIN32_FIND_DATA, *LPWIN32_FIND_DATA;You have the file size already inside this as well as the file attributes. No need to open the file and use more calls and such to obtain that.The size is calculated via:(nFileSizeHigh * (0xFFFFFFFF+1)) + nFileSizeLow
starzboy Posted November 28, 2008 Author Posted November 28, 2008 @Atomosi didnt mean to offend you in the first place.bro, i use the recursive scan to get the files, and i need to get the file CRC that is why i have opened them, can we end up with something faster ?thankyou
human Posted November 28, 2008 Posted November 28, 2008 nope. if you want to be always sure that file crc is ok and file content havent changed then only way is to calc it again for every file.long and slow process.
Killboy Posted November 28, 2008 Posted November 28, 2008 The slow part isn't parsing the file list but the opening. Unless you find a way to get some sort of filesystem crc (maybe native NTFS checksum or something) without loading every file into memory and calcing the CRC, it's not gonna get any faster...
human Posted November 28, 2008 Posted November 28, 2008 (edited) there is no such thingdue write one byte and what ntfs will calc whole crc for 8gb file?and even using sector crc will be not faster,due you read whole sector, so its like reading whole file. Edited November 28, 2008 by human
starzboy Posted December 14, 2008 Author Posted December 14, 2008 ok guys, i am stuck once again.How do i get the file creation time and others via WIN32_FIND_DATAI seem to be so stuck !
Killboy Posted December 14, 2008 Posted December 14, 2008 (edited) if(FindData.ftCreationTime != 0){ FileTimeToSystemTime(&FindData.ftCreationTime, &SystemTime); sprintf(Buffer, "Created in %d", SystemTime.wYear);}Those are all the members you can access:http://msdn.microsoft.com/en-us/library/aa365740(VS.85).aspx Edited December 14, 2008 by Killboy
Nacho_dj Posted December 14, 2008 Posted December 14, 2008 You can use GetFileTime function...http://msdn.microsoft.com/en-us/library/ms724320(VS.85).aspx
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now