Hadoop Appliance

There are many difficulities and costs to set up Hadoop clusters and operate/utilize them properly. It's a very repetitive job to install operating systems and requied softwares on many machines as well as to run jobs via command-line interface. Hadoop Appliance provides solutions for these problems -- web based UI that includes various management functions and extension to Hadoop's original functions.

Overview

  • Installation and managemnt of a cluster
  • Set up and managemnt of multiple Hadoop clusters
  • Web based UI for functions that Hadoop offers
  • Extension to Hadoop itself
  • Hadoop HA (High-Availability)

Features

Server Provisioning

  • Automatic server provisiong : supports parallel and fast deployment via network booting with loggign
  • DHCP based IP management : mapping IP-MAC addresses or setting individual hosts
  • Internal DNS : provides internal domain names without modifying hosts files to increase security

Cluster Dashboard

  • Whole cluster dashboard : provides information of the whole cluster
  • Individual Hadoop cluster dashboard : provides status information of each node, MapReduce jobs, HDFS disk usage as well as NameNode and TaskTrackers

MapReduce Job Management

  • Web based management of jobs : submitting jar files and viewing log on the web
  • Setting input/output paths via HDFS browser
  • Undo of submitted jobs

HDFS Browser

  • Web based file/directory browsing
  • Viewing file contents on the web
  • Web based uploading and downloading files

Hadoop Namenode HA

  • Detection of Namenode failures via heartbeat + S.M.A.R.T of HDD
  • Automatic IP change to stand-by node when detected failure

Resource & Server Monitoring

  • Real-time system monitoring : includes change analysis by RRD
  • Hadoop monitoring : collects Hadoop status and metrics with integration with Hadoop management features
  • Graphs : generating graphs of analysis result for various metrics and web UI for them

System Architecture

Effects

  • Reduced cluster deployment time with automatic server provisioning and cluster set up process
  • More efficient management with single system that can handle nodes and Hadoop
  • Increased productivity with convenient MapReduce job management along with HDFS browser
  • Improved accessibility with web-based UI