Jump to content

The need for speed: Help with a fast RC4 key search?


Mondo

Recommended Posts

Hi Guys,

I've recently written some code in C which searches for a 40-bit key used to encrypt a known plain text. The code works and is able to find a known key to find for a given cipher text.

The trouble is that is that rate of keys searched is very low, say 4-5,000 keys/sec.

Despite running this on an old PC (Athlon XP 2200+) I know I should be seeing much faster results.

For example, using APDFPR as a benchmark, I can see an average rate of 1.7x10^6k/s.

My compiler settings (GCC and VC6) are optimised for speed and for my processor. I've also replaced the array subscripting with pointers as I remember reading that this is a faster method. Neither of these made any impact on speed up.

I wanted to ask if anyone has a fast implementation of the RC4 they would like to share?

I'd be interested in something coded in assembler that uses the MMX registers. I would be happy to a least match the key search performance of APDFPR.

I'd also be very also interested in source code that uses the CUDA extensions to achieve even faster speeds.

The author of this paper was able to achieve 6.5x10^6 key searches per second using a 9800GX2 card. I tried contacting the author directly but he can no longer be reached on the supplied email address. I would be absolutely delighted if I could get similar results to these.

Can anyone help please?

Thanks.

Link to comment

For one, VC6 is slow crap, so unless you use another compiler along with it, this could be part of your problem.

I'm not entirely familiar with RC, from a quick glance it's fairly hard to optimize the algo itself. You might have better luck running the algorithm in parallel on different keys, either through threads or SSE.

Would be great if you could post you code.

Link to comment

Maybe your problem is the key generation. On a 40-Bit RC4 encryption you dont need a string key generation algorithm.

Maybe you do something like this:

1) generate string key e.g. "aaa"

2) test "aaa" as key

3) generate new string key

your keysize is only 40 bit -> 2^40 possible keys, so test each key for 100% successful decryption

start by key 0x00 00 00 00 01

until 0xFF FF FF FF FF

this are just some bit operations and should be fast

Link to comment

Thank you for the replies so far.:)

@Killboy, I'll try installing a copy of Visual Studio 2008 and see if that speeds things up. I'll post the algo I am using a little later.

The C algo in OpenSSL is reported to be a fast implementation so I will try this to see if it improves things.

@Aguila, I am currently using a fast key generation function which works in a similar way to what you suggest so that isn't the problem.

Link to comment

i suppose you are able to decrease O(2^40) time to more less time

for ex. if key consist only from printable characters, and i think there are 0x60-0x70 printable characters, using english and common keyboard symbols(numbers,symbols,etc), this will decrease your computation to less than O(2^20). i was always making some kind of alphabet and bruteforcing it with only those symbols, which i want. so i guess you will be able to get key in some seconds or less. but maybe your RC4 algo will slow it down till O(2^32) i guess, which will take some mins or hours (again depends on your RC4 code and pc power).

Link to comment

i suppose you are able to decrease O(2^40) time to more less time

for ex. if key consist only from printable characters, and i think there are 0x60-0x70 printable characters, using english and common keyboard symbols(numbers,symbols,etc), this will decrease your computation to less than O(2^20). i was always making some kind of alphabet and bruteforcing it with only those symbols, which i want. so i guess you will be able to get key in some seconds or less. but maybe your RC4 algo will slow it down till O(2^32) i guess, which will take some mins or hours (again depends on your RC4 code and pc power).

Sorry beforehead for being a nazi about this cool.png

O notation is a measure of growth rate, not time. You call it time but then you equal 1 operation = 1 RC4 pass.

Also, O(2^32) is the same as O(2^40), really both are O(1). I guess you mean O(n), but there is nothing like O(2^32n) either.

What you meant was 2^32 RC4 passes which simply is not what O notation is used for.

Link to comment

my bad, sorry :)

i thought it's all same sh*t hehe, and im not that good in bruteforcing, espacially in computing valid time, as you can see.. but i suppose my suggestion is still able to increase the speed of computation of RC4 key ;)

Link to comment

@qpt^J, unfortunately I cannot limit the key search space to printable chars.

I'm now looking into the possibility of using optimised hand coded assembly language.

Link to comment

ASM won't be of much help here, any modern optimizing compiler will generate code that is close to 99% of the performance of pure ASM.

What you could look into is some precomputation you can do for common parts of different keys or for the buffer.

e.g. if you have keys 01XXXXXXXX 01YYYYYYYY, you could precompute some permutation in the RC4 state depending on the first key byte or whatever.

Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...