Setting up EC2 account and tools
tar –xzvf hadoop-1.0.0.tar.gzCreate hadoop-ec2 initialization script
vi hadoop-ec2-init.sh (you can use your preferred editor)
export HADOOP_EC2_BIN=~/hadoop-1.0.0/src/contrib/ec2/bin
export PATH=$PATH:$HADOOP_EC2_BIN
source hadoop-ec2-init.sh
This will need to be done every login
Alternately, put it in ~/.profile to have it done automatically on login
Configure hadoop with EC2 account
vi ~/hadoop-1.0.0/src/contrib/ec2/bin/hadoop-ec2-env.sh
AWS_ACCOUNT_ID=283072064258
AWS_ACCESS_KEY_ID=<from Dr. Jin’s email>
Looks like AKIAJ5U4QYDDZCNDDY5Q
AWS_SECRET_ACCESS_KEY=<from email>
Looks like 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
KEY_NAME=<group>-keypair
The same keypair you set up earlier at ~/.ec1/ida_rsa-<group>-keypairCreate/launch cluster
hadoop-ec2 launch-cluster <group>-cluster 2
Can take 10-20 minutes!
Keep an eye on it from the AWS -> EC2 console tab
Note your master node DNS name, you’ll need it later
Looks like: ec2-107-21-112-172.compute-1.amazonaws.com
Test login to master node
hadoop-ec2 login <group>-cluster
Troubleshooting: If you didn’t setup your keypair properly, you’ll get:
[ec2-user@ip-10-123-22-179 ~]$ hadoop-ec2 login test-cluster
Logging in to host ec2-107-21-112-172.compute-1.amazonaws.com.
Warning: Identity file /home/ec2-user/.ec2/id_rsa-<group>-keypair not accessible: No such file or directory.
Permission denied (publickey,gssapi-with-mic).
Troubleshooting: http://wiki.apache.org/hadoop/AmazonEC2Running a Map/Reduce Job
Copy the jar file to the master-node
scp -i ~/.ec2/id_rsa-<group>-keypair hadoop-1.0.0/hadoop-examples-1.0.0.jar root@<master node>:/tmp
Get your master node from the ‘hadoop login <group>-cluster’ command, it will look something like this:
ec2-107-21-112-172.compute-1.amazonaws.com
(Optional) Copy your HDFS files to the master-node
Compress data for faster transfer
tar –cjvf data.bz2 <data-dir>
scp -i ~/.ec2/id_rsa-<group>-keypair data.bz2 root@<master node>:/tmp
Upload data to HDFS, HDFS is already setup on the nodes
hadoop fs –put /tmp/<data-file>
Login to the master node
hadoop login <group>-cluster
Run the Map/Reduce job
hadoop jar /tmp/hadoop-examples-1.0.0.jar pi 10 10000000
Track task process from the web
http://<master node>:50030
E.g. http://ec2-107-21-112-172.compute-1.amazonaws.com:50030