Building and Running Hadoop

Building

Git checkout the following: git://github.com/apache/hadoop-common.git

I checked out the release-2.0.2-alpha tag. To save a *lot* of compile time I used the following build command line:

mvn package -Pdist,native -DskipTests -Dtar -Dmaven.javadoc.skip=true

Patches

Attached to this page are several patches that I used when testing Hadoop. They are exploratory code, mostly untested, and should not be used for production.

Patch

Description

0001-Introduce-the-HyperCrc32C-Checksum-class.patch

Add the HyperCrc32C class which calls native CRC functions for incremental CRC update.

0002-libhadoop-CRC-ARM-NEON-support.patch

Add support for NEON buffer folding.

0003-Modify-HyperCRC-to-target-NEON-routine.patch

Change the HyperCrc32C class to target NEON CRC optimised code rather than plain slice-by-8.

Also investigated was Trevor Robinson's write path optimisation patch: https://issues.apache.org/jira/browse/HDFS-3529

The NEON buffer folding can be found documented at: https://wiki.linaro.org/LEG/Engineering/CRC

Running

Hadoop 2.0.2-alpha is being tested with the following single machine configuration.

In hdfs-site.xml, the following are configured: dfs.namenode.rpc-address is configured to be the ip address of the dev board and an arbitrary high port. dfs.namenode.name.dir, dfs.namenode.edits.dir and dfs.datanode.data.dir are configured to point to storage on an external usb drive.

In core-site.xml, fs.defaultFS is configured to be the same as dfs.namenode.rpc-address.

The nodes are brought up as follows (the datanode and namenode run on the same machine) ./bin/hdfs namenode -format (only do this once to initialise HDFS) ./bin/hdfs namenode ./bin/hdfs datanode

TestDFSIO was used to test the HDFS IO rate: Two example invocations:

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha-tests.jar TestDFSIO -fileSize 10GB -write

and,

./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.0.2-alpha-tests.jar TestDFSIO -fileSize 10GB -read

LEG/Engineering/BigData/hadoopbuildrun (last modified 2016-03-21 23:06:43)