Java vs. C
Several people have told me recently that Java runs as fast as C. After repeating this information somewhat, I decided to test it for computer forensics.
The test I constructed a test file of 4238912226 bytes. The test involved reading the file 4K byte blocks at a time and computing the SHA1 hash of each block.
Here is the C program I used:
#include <stdio.h> #include <stdlib.h> #include <openssl/sha.h> int main(int argc,char **argv) { FILE *f = 0; if(argc!=2){ fprintf(stderr,"usage: %s - compute block hashes (but don't print them)\n"); } f = fopen(argv[1],"r"); if(!f) { perror(argv[1]); exit(1); } while(!feof(f)){ char buf[4096]; unsigned char md[20]; size_t count = fread(buf,1,sizeof(buf),f); SHA_CTX c; SHA_Init(&c); SHA_Update(&c,buf,count); SHA_Final(md,&c); } fclose(f); }
I ran the test 3 times on my Mac Pro (2x2.66 Ghz Dual-Core Intel Xeons, 12GB 667 Mhz DDR2 FB-DIMM memory, 1TB hard drive)
12:59 PM m:~/nps/speedtest$ time ./ctest /realistic.aff real 0m53.443s user 0m25.459s sys 0m6.113s 01:00 PM m:~/nps/speedtest$ time ./ctest /realistic.aff real 0m31.137s user 0m25.327s sys 0m5.650s 01:01 PM m:~/nps/speedtest$ time ./ctest /realistic.aff real 0m31.694s user 0m25.392s sys 0m5.920s 01:02 PM m:~/nps/speedtest$
The first time the file was being read off the disk, the second two trials the file was in memory.
Interestingly, the entire file can be read in around 8 seconds on this hardware:
time dd if=/realistic.aff of=/dev/null bs=4096 1034890+1 records in 1034890+1 records out 4238912226 bytes transferred in 7.786416 secs (544398372 bytes/sec) real 0m7.979s user 0m1.455s sys 0m6.335s 01:22 PM m:~/nps/speedtest$
So how fast is Java? Here is my Java program:
import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; import java.io.*; public class jtest { public static void main(String[] args){ long t0 = System.currentTimeMillis(); try { System.out.println("Start"); FileInputStream fis = new FileInputStream(new File(args[0])); MessageDigest md = MessageDigest.getInstance("SHA"); while(true){ md.reset(); byte[] buf = new byte[4096]; int count = fis.read(buf); if(count==-1) break; md.update(buf,0,count); byte[] f = md.digest(); } System.out.println("Done"); } catch (IOException e){ System.out.println(e); } catch (NoSuchAlgorithmException e){ System.out.println(e); } long t1 = System.currentTimeMillis(); System.out.printf("Miliseconds to execute: %d\n",t1-t0); } }
Notice that I have the program report how long it takes to run the benchmark, so we can factor out the cost of JVM startup.
01:38 PM m:~/nps/speedtest$ time java jtest /realistic.aff Start Done Miliseconds to execute: 98012 real 1m38.611s user 1m26.193s sys 0m7.176s 01:40 PM m:~/nps/speedtest$ time java jtest /realistic.aff Start Done Miliseconds to execute: 92977 real 1m34.298s user 1m26.149s sys 0m6.701s 01:42 PM m:~/nps/speedtest$
So those are pretty disappointing numbers. Java seems to be running 3x slower.
Second Java Try
It's possible that most of the Java overhead was in creating the new hash object each time through. So I tried this version, which makes one hash object and then clones it: