arundhaj

regression towards the datascience

Building Hadoop source code

 

The Apache Hadoop is a framework that allows for distributed processing of large data sets across clusters of computers using MapReduce.

The steps listed below is to build and package hadoop from source code. This guide assumes a fresh installation of Ubuntu 14.04 version.

  1. Let's start with installing Oracle Java. First navigate to home folder cd
mkdir installations && cd installations
wget --no-check-certificate --no-cookies --header "Cookie:oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/7u71-b14/jdk-7u71-linux-x64.tar.gz
sudo mkdir /usr/lib/jvm
sudo tar xvzf jdk-7u71-linux-x64.tar.gz -C /usr/lib/jvm/
  • Then we shall install Apache Maven. Hadoop's main build system.
wget http://apache.tradebit.com/pub/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz
sudo tar xvzf apache-maven-3.2.3-bin.tar.gz - C /usr/local/
  • Now, set the environmental variables by appending the following lines at the end of ~/.bashrc file.
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export M2_HOME=/usr/local/apache-maven-3.2.3
export PATH=$PATH:$JAVA_HOME/bin:$M2_HOME/bin
  • Lets check if java and maven being installed correcly.
java -version
mvn --version
  • Install the following dependent packages
sudo apt-get update
sudo apt-get -y install build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev rsync openssh-server
  • Setup ssh for password-less login
cd
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  • Check if you are able to ssh without password. And make sure to exit and return to main shell.
ssh localhost
  • Install protobuf v2.5.0. This is the highest version Hadoop supports.
mkdir source_code && cd source_code
wget https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
tar xvzf protobuf-2.5.0.tar.gz
cd protobuf-2.5.0.tar.gz
./configure
make
make check
sudo make install
sudo ldconfig
protoc --version
  • Get the Hadoop source code. You can clone the repository from the source control. For this tutorial, I've downloaded the source tar.
wget http://apache.claz.org/hadoop/common/hadoop-2.5.2/hadoop-2.5.2-src.tar.gz
tar xvzf hadoop-2.5.2-src.tar.gz
cd hadoop-2.5.2-src
mvn package -Pdist -Dtar -DskipTests

Depending upon the internet speed, the compile may take 10 minutes to 1 hour. Once the build is successful, the distribution would be available in hadoop-dist/target/hadoop-2.5.2.tar.gz

Hope this helps.

Comments