Building and Running Hadoop
Git checkout the following: git://github.com/apache/hadoop-common.git
I checked out the release-2.0.2-alpha tag. To save a *lot* of compile time I used the following build command line:
mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true
Attached to this page are several patches that I used when testing Hadoop. They are exploratory code, mostly untested, and should not be used for production.
Add the HyperCrc32C class which calls native CRC functions for incremental CRC update.
Add support for NEON buffer folding.
Change the HyperCrc32C class to target NEON CRC optimised code rather than plain slice-by-8.
Also investigated was Trevor Robinson's write path optimisation patch: https://issues.apache.org/jira/browse/HDFS-3529
The NEON buffer folding can be found documented at: https://wiki.linaro.org/LEG/Engineering/CRC
Hadoop 2.0.2-alpha is being tested with the following single machine configuration.
In hdfs-site.xml, the following are configured: dfs.namenode.rpc-address is configured to be the ip address of the dev board and an arbitrary high port. dfs.namenode.name.dir, dfs.namenode.edits.dir and dfs.datanode.data.dir are configured to point to storage on an external usb drive.
In core-site.xml, fs.defaultFS is configured to be the same as dfs.namenode.rpc-address.
The nodes are brought up as follows (the datanode and namenode run on the same machine) ./bin/hdfs namenode -format (only do this once to initialise HDFS) ./bin/hdfs namenode ./bin/hdfs datanode
TestDFSIO was used to test the HDFS IO rate: Two example invocations:
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha-tests.jar TestDFSIO -fileSize 10GB -write
./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha-tests.jar TestDFSIO -fileSize 10GB -read
LEG/Engineering/BigData/hadoopbuildrun (last modified 2016-03-21 23:06:43)