Building and Running Hadoop


Git checkout the following: git://

I checked out the release-2.0.2-alpha tag. To save a *lot* of compile time I used the following build command line:

mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true


Attached to this page are several patches that I used when testing Hadoop. They are exploratory code, mostly untested, and should not be used for production.




Add the HyperCrc32C class which calls native CRC functions for incremental CRC update.


Add support for NEON buffer folding.


Change the HyperCrc32C class to target NEON CRC optimised code rather than plain slice-by-8.

Also investigated was Trevor Robinson's write path optimisation patch:

The NEON buffer folding can be found documented at:


Hadoop 2.0.2-alpha is being tested with the following single machine configuration.

In hdfs-site.xml, the following are configured: dfs.namenode.rpc-address is configured to be the ip address of the dev board and an arbitrary high port., dfs.namenode.edits.dir and are configured to point to storage on an external usb drive.

In core-site.xml, fs.defaultFS is configured to be the same as dfs.namenode.rpc-address.

The nodes are brought up as follows (the datanode and namenode run on the same machine) ./bin/hdfs namenode -format (only do this once to initialise HDFS) ./bin/hdfs namenode ./bin/hdfs datanode

TestDFSIO was used to test the HDFS IO rate: Two example invocations:

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha-tests.jar TestDFSIO -fileSize 10GB -write


./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha-tests.jar TestDFSIO -fileSize 10GB -read

LEG/Engineering/BigData/hadoopbuildrun (last modified 2016-03-21 23:06:43)