Mondo Posted August 17, 2011 Posted August 17, 2011 Hi Guys,I've recently written some code in C which searches for a 40-bit key used to encrypt a known plain text. The code works and is able to find a known key to find for a given cipher text.The trouble is that is that rate of keys searched is very low, say 4-5,000 keys/sec.Despite running this on an old PC (Athlon XP 2200+) I know I should be seeing much faster results.For example, using APDFPR as a benchmark, I can see an average rate of 1.7x10^6k/s.My compiler settings (GCC and VC6) are optimised for speed and for my processor. I've also replaced the array subscripting with pointers as I remember reading that this is a faster method. Neither of these made any impact on speed up.I wanted to ask if anyone has a fast implementation of the RC4 they would like to share?I'd be interested in something coded in assembler that uses the MMX registers. I would be happy to a least match the key search performance of APDFPR. I'd also be very also interested in source code that uses the CUDA extensions to achieve even faster speeds.The author of this paper was able to achieve 6.5x10^6 key searches per second using a 9800GX2 card. I tried contacting the author directly but he can no longer be reached on the supplied email address. I would be absolutely delighted if I could get similar results to these.Can anyone help please?Thanks.
Killboy Posted August 17, 2011 Posted August 17, 2011 For one, VC6 is slow crap, so unless you use another compiler along with it, this could be part of your problem.I'm not entirely familiar with RC, from a quick glance it's fairly hard to optimize the algo itself. You might have better luck running the algorithm in parallel on different keys, either through threads or SSE.Would be great if you could post you code.
Aguila Posted August 17, 2011 Posted August 17, 2011 Maybe your problem is the key generation. On a 40-Bit RC4 encryption you dont need a string key generation algorithm.Maybe you do something like this:1) generate string key e.g. "aaa"2) test "aaa" as key3) generate new string keyyour keysize is only 40 bit -> 2^40 possible keys, so test each key for 100% successful decryptionstart by key 0x00 00 00 00 01until 0xFF FF FF FF FFthis are just some bit operations and should be fast
Mondo Posted August 17, 2011 Author Posted August 17, 2011 Thank you for the replies so far. @Killboy, I'll try installing a copy of Visual Studio 2008 and see if that speeds things up. I'll post the algo I am using a little later. The C algo in OpenSSL is reported to be a fast implementation so I will try this to see if it improves things. @Aguila, I am currently using a fast key generation function which works in a similar way to what you suggest so that isn't the problem.
qpt^J Posted August 18, 2011 Posted August 18, 2011 i suppose you are able to decrease O(2^40) time to more less timefor ex. if key consist only from printable characters, and i think there are 0x60-0x70 printable characters, using english and common keyboard symbols(numbers,symbols,etc), this will decrease your computation to less than O(2^20). i was always making some kind of alphabet and bruteforcing it with only those symbols, which i want. so i guess you will be able to get key in some seconds or less. but maybe your RC4 algo will slow it down till O(2^32) i guess, which will take some mins or hours (again depends on your RC4 code and pc power).
Killboy Posted August 18, 2011 Posted August 18, 2011 i suppose you are able to decrease O(2^40) time to more less time for ex. if key consist only from printable characters, and i think there are 0x60-0x70 printable characters, using english and common keyboard symbols(numbers,symbols,etc), this will decrease your computation to less than O(2^20). i was always making some kind of alphabet and bruteforcing it with only those symbols, which i want. so i guess you will be able to get key in some seconds or less. but maybe your RC4 algo will slow it down till O(2^32) i guess, which will take some mins or hours (again depends on your RC4 code and pc power). Sorry beforehead for being a nazi about this O notation is a measure of growth rate, not time. You call it time but then you equal 1 operation = 1 RC4 pass. Also, O(2^32) is the same as O(2^40), really both are O(1). I guess you mean O(n), but there is nothing like O(2^32n) either. What you meant was 2^32 RC4 passes which simply is not what O notation is used for.
qpt^J Posted August 18, 2011 Posted August 18, 2011 my bad, sorry i thought it's all same sh*t hehe, and im not that good in bruteforcing, espacially in computing valid time, as you can see.. but i suppose my suggestion is still able to increase the speed of computation of RC4 key
Mondo Posted August 18, 2011 Author Posted August 18, 2011 @qpt^J, unfortunately I cannot limit the key search space to printable chars.I'm now looking into the possibility of using optimised hand coded assembly language.
Killboy Posted August 18, 2011 Posted August 18, 2011 ASM won't be of much help here, any modern optimizing compiler will generate code that is close to 99% of the performance of pure ASM.What you could look into is some precomputation you can do for common parts of different keys or for the buffer.e.g. if you have keys 01XXXXXXXX 01YYYYYYYY, you could precompute some permutation in the RC4 state depending on the first key byte or whatever.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now