Java vs. C

From Simson Garfinkel
Revision as of 16:03, 28 May 2009 by Simson (talk | contribs)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Several people have told me recently that Java runs as fast as C. After repeating this information somewhat, I decided to test it for computer forensics.

The test I constructed a test file of 4238912226 bytes. The test involved reading the file 4K byte blocks at a time and computing the SHA1 hash of each block.

Speed in C

Here is the C program I used:

#include <stdio.h>
#include <stdlib.h>
#include <openssl/sha.h>

int main(int argc,char **argv)
{
    
    FILE *f = 0;
    if(argc!=2){
	fprintf(stderr,"usage: %s - compute block hashes (but don't print them)\n");
    }
    f = fopen(argv[1],"r");
    if(!f) {
	perror(argv[1]);
	exit(1);
    }
    while(!feof(f)){
	char buf[4096];
	unsigned char md[20];
	size_t count = fread(buf,1,sizeof(buf),f);
	SHA_CTX c;
	SHA_Init(&c);
	SHA_Update(&c,buf,count);
	SHA_Final(md,&c);
    }
    fclose(f);
}

I ran the test 3 times on my Mac Pro (2x2.66 Ghz Dual-Core Intel Xeons, 12GB 667 Mhz DDR2 FB-DIMM memory, 1TB hard drive)

12:59 PM m:~/nps/speedtest$ time ./ctest /realistic.aff 

real	0m53.443s
user	0m25.459s
sys	0m6.113s
01:00 PM m:~/nps/speedtest$ time ./ctest /realistic.aff 

real	0m31.137s
user	0m25.327s
sys	0m5.650s
01:01 PM m:~/nps/speedtest$ time ./ctest /realistic.aff 

real	0m31.694s
user	0m25.392s
sys	0m5.920s
01:02 PM m:~/nps/speedtest$

The first time the file was being read off the disk, the second two trials the file was in memory.

Interestingly, the entire file can be read in around 8 seconds on this hardware:

time dd if=/realistic.aff of=/dev/null bs=4096
1034890+1 records in
1034890+1 records out
4238912226 bytes transferred in 7.786416 secs (544398372 bytes/sec)

real	0m7.979s
user	0m1.455s
sys	0m6.335s
01:22 PM m:~/nps/speedtest$ 

Speed in Java

So how fast is Java? Here is my Java program:

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.io.*;
	
public class jtest {
    public static void main(String[] args){
	long t0 = System.currentTimeMillis();
	try {
	    System.out.println("Start");
	    FileInputStream fis = new FileInputStream(new File(args[0]));
	    MessageDigest md = MessageDigest.getInstance("SHA");
	    while(true){
		md.reset();
		byte[] buf = new byte[4096];
		int count = fis.read(buf);
		if(count==-1) break;
		md.update(buf,0,count);
		byte[] f = md.digest();
	    }
	    System.out.println("Done");
	}
	catch (IOException e){
	    System.out.println(e);
	}
	catch (NoSuchAlgorithmException e){
	    System.out.println(e);
	}
	long t1 = System.currentTimeMillis();
	System.out.printf("Miliseconds to execute: %d\n",t1-t0);
    }
}

Notice that I have the program report how long it takes to run the benchmark, so we can factor out the cost of JVM startup.

01:38 PM m:~/nps/speedtest$ time java jtest /realistic.aff 
Start
Done
Miliseconds to execute: 98012

real	1m38.611s
user	1m26.193s
sys	0m7.176s
01:40 PM m:~/nps/speedtest$ time java jtest /realistic.aff 
Start
Done
Miliseconds to execute: 92977

real	1m34.298s
user	1m26.149s
sys	0m6.701s
01:42 PM m:~/nps/speedtest$ 

So those are pretty disappointing numbers. Java seems to be running 3x slower.

Speed in Python

Just for kicks, I tried the same test in Python. I say "kicks" because it's not a pure python implementation, of course: SHA1 is computed in Python using OpenSSL and a C-language pass-through.

Here is the program:


import hashlib
from time import time
import sys


if __name__=="__main__":
    t0 = time()
    f = open(sys.argv[1])
    while True:
        buf = f.read(4096)
        if len(buf)==0: break
        result = hashlib.sha1(buf)
    t1 = time()
    print "total time: ",t1-t0

Perhaps Java should adopt this strategy; here are the results:

01:48 PM m:~/nps/speedtest$ time python ptest.py /realistic.aff 
total time:  36.4702107906

real	0m36.718s
user	0m30.237s
sys	0m6.265s
01:49 PM m:~/nps/speedtest$ time python ptest.py /realistic.aff 
total time:  36.5983538628

real	0m36.654s
user	0m30.318s
sys	0m6.272s
01:50 PM m:~/nps/speedtest$ time python ptest.py /realistic.aff 
total time:  36.6046440601

real	0m36.683s
user	0m30.306s
sys	0m6.295s
01:50 PM m:~/nps/speedtest$ 

See Also

Others have looked at this and concluded that C is generally slower than Java in real-world code because C code needs defensive copying whereas Java doesn't. Another advantage of Java is that there are no buffer overflows.