H2O Install and Run Guide

Install Guide

Note: Vanilla Apache Hadoop is not supported. Only Hortonworks, Cloudera and Mapr Hadoop Distros are supported.
Video: https://www.youtube.com/watch?v=B1ax_k_sSoY

  1. Install and run Hortonworks HDP 2.2 on all nodes. (Contact Steve Capper to get ARM64 source for HDP and you can build it like how you build Vanilla Hadoop)
  2. Make sure there are no errors in Hadoop logs. Check Hadoop health (resource and node manager logs especially)
  3. Make sure all Hadoop nodes are in sync w.r.t. system time
  4. Download H2O on your master machine from http://h2o.ai/download/

    1. Go to the above link
    2. Under Download H2O section, Click on the latest stable release link
    3. Then click on the 'Install on Hadoop' tab
    4. Follow the instructions on this page and download H2O for hadoop using the links provided as per your Hadoop distro
  5. Run H2O with the following command from H2O's directory:
    • $HADOOP_HOME/bin/hadoop jar h2odriver.jar -nodes 6 -mapperXmx 25g -timeout 1800 -network '' -output hdfsOutput_h2o_01

    • This will start a ReST API on the cluster
    • H2O is started as a mapper on each node
    • You can set -mapperXmx as per your system. This is memory per node allocated for mapper process. Minimum 6g per node is recommended
    • For multiple nodes, it usually takes longer than 120s which is the default timeout to bringup H2O cluster. Hence, set a high timeout value.
    • -output directory concept is similar to Hadoop. New directory for each run so that old results are not overwritten.

Flow Guide and Airlines Demo

  1. Once H2O is running, go to your web browser and open port 54321 of your master machine on which H2O is running. For eg: (Note: It is not necessary that H2O will start it's ReST API only on Master machine. Please check the std out logs on your terminal when you start H2O.)

  2. The above step loads H2O flow in the browser which is an open source web UI.
  3. Video for H2O flow: https://www.youtube.com/watch?v=wzeuFfbW7WE

  4. Video for Airlines demo: https://www.youtube.com/watch?v=5UCZngHX7EI

  5. An older video for airlines demo: https://www.youtube.com/watch?v=bInMSgZhDd4

  6. Play around with Cluster Status and Water meter tools available in 'Admin' toolbar menu

LEG/Engineering/BigData/H2OInstallAndRunGuide (last modified 2016-03-21 23:03:09)