Java vs. C

From Simson Garfinkel
Revision as of 18:06, 31 May 2009 by Simson (talk | contribs) (→‎See Also)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Several people have told me recently that Java runs as fast as C. After repeating this information somewhat, I decided to test it for computer forensics.

The test I constructed a test file of 4238912226 bytes. The test involved reading the file 4K byte blocks at a time and computing the SHA1 hash of each block.

Speed in C

Here is the C program I used:

#include <stdio.h>
#include <stdlib.h>
#include <openssl/sha.h>

int main(int argc,char **argv)
{
    
    FILE *f = 0;
    if(argc!=2){
	fprintf(stderr,"usage: %s - compute block hashes (but don't print them)\n");
    }
    f = fopen(argv[1],"r");
    if(!f) {
	perror(argv[1]);
	exit(1);
    }
    while(!feof(f)){
	char buf[4096];
	unsigned char md[20];
	size_t count = fread(buf,1,sizeof(buf),f);
	SHA_CTX c;
	SHA_Init(&c);
	SHA_Update(&c,buf,count);
	SHA_Final(md,&c);
    }
    fclose(f);
}

I ran the test 3 times on my Mac Pro (2x2.66 Ghz Dual-Core Intel Xeons, 12GB 667 Mhz DDR2 FB-DIMM memory, 1TB hard drive)

12:59 PM m:~/nps/speedtest$ time ./ctest /realistic.aff 

real	0m53.443s
user	0m25.459s
sys	0m6.113s
01:00 PM m:~/nps/speedtest$ time ./ctest /realistic.aff 

real	0m31.137s
user	0m25.327s
sys	0m5.650s
01:01 PM m:~/nps/speedtest$ time ./ctest /realistic.aff 

real	0m31.694s
user	0m25.392s
sys	0m5.920s
01:02 PM m:~/nps/speedtest$

The first time the file was being read off the disk, the second two trials the file was in memory.

Interestingly, the entire file can be read in around 8 seconds on this hardware:

time dd if=/realistic.aff of=/dev/null bs=4096
1034890+1 records in
1034890+1 records out
4238912226 bytes transferred in 7.786416 secs (544398372 bytes/sec)

real	0m7.979s
user	0m1.455s
sys	0m6.335s
01:22 PM m:~/nps/speedtest$ 

Speed in Java

So how fast is Java? Here is my Java program:

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.io.*;
	
public class jtest {
    public static void main(String[] args){
	long t0 = System.currentTimeMillis();
	try {
	    System.out.println("Start");
	    FileInputStream fis = new FileInputStream(new File(args[0]));
	    MessageDigest md = MessageDigest.getInstance("SHA");
	    while(true){
		md.reset();
		byte[] buf = new byte[4096];
		int count = fis.read(buf);
		if(count==-1) break;
		md.update(buf,0,count);
		byte[] f = md.digest();
	    }
	    System.out.println("Done");
	}
	catch (IOException e){
	    System.out.println(e);
	}
	catch (NoSuchAlgorithmException e){
	    System.out.println(e);
	}
	long t1 = System.currentTimeMillis();
	System.out.printf("Miliseconds to execute: %d\n",t1-t0);
    }
}

Notice that I have the program report how long it takes to run the benchmark, so we can factor out the cost of JVM startup.

01:38 PM m:~/nps/speedtest$ time java jtest /realistic.aff 
Start
Done
Miliseconds to execute: 98012

real	1m38.611s
user	1m26.193s
sys	0m7.176s
01:40 PM m:~/nps/speedtest$ time java jtest /realistic.aff 
Start
Done
Miliseconds to execute: 92977

real	1m34.298s
user	1m26.149s
sys	0m6.701s
01:42 PM m:~/nps/speedtest$ 

So those are pretty disappointing numbers. Java seems to be running 3x slower.

Speed in Python

Just for kicks, I tried the same test in Python. I say "kicks" because it's not a pure python implementation, of course: SHA1 is computed in Python using OpenSSL and a C-language pass-through.

Here is the program:


import hashlib
from time import time
import sys


if __name__=="__main__":
    t0 = time()
    f = open(sys.argv[1])
    while True:
        buf = f.read(4096)
        if len(buf)==0: break
        result = hashlib.sha1(buf)
    t1 = time()
    print "total time: ",t1-t0

Perhaps Java should adopt this strategy; here are the results:

01:48 PM m:~/nps/speedtest$ time python ptest.py /realistic.aff 
total time:  36.4702107906

real	0m36.718s
user	0m30.237s
sys	0m6.265s
01:49 PM m:~/nps/speedtest$ time python ptest.py /realistic.aff 
total time:  36.5983538628

real	0m36.654s
user	0m30.318s
sys	0m6.272s
01:50 PM m:~/nps/speedtest$ time python ptest.py /realistic.aff 
total time:  36.6046440601

real	0m36.683s
user	0m30.306s
sys	0m6.295s
01:50 PM m:~/nps/speedtest$ 

See Also

Others have looked at this and concluded that C is generally slower than Java in real-world code because C code needs defensive copying whereas Java doesn't. Another advantage of Java is that there are no buffer overflows.