Setting up Amazon EC2 and using Hadoop

| Posted in Daily life, Hadoop, Linux Server

Setting up EC2 account and tools

Create AMI signing certificate

mkdir ~/.ec2

cd ~/.ec2

openssl genrsa des3 -out pk.pem 2048

openssl rsa in pk.pem out pkunencrypt-.pem

openssl req new -x509 -key pk.pem out cert.pem -days 1095

Share all three .pem files manually with group members

Troubleshooting: If your client date is wrong your certs will not work



Upload certificate to AWS via IAM page


Account: 123456

Username: group** (e.g. group1, group5, group10)


Password: xxxxxxxxxxxxx


Click IAM tab -> users -> select yourself (use right arrow if needed)

In bottom pane select “Security Credentials” tab and click “Manage Signing Certificates”

Click “Upload Signing Certificate”

cat ~/.ec2/cert-.pem

Copy contents into ‘Certificate Body’ textbox and click ‘OK’



Retrieve and unpack AWS tools


unzip ec2-api-tools.zip

Create ec2 initialization script

vi ec2-init.sh (you can use your preferred editor)

export JAVA_HOME=/usr

export EC2_HOME=~/ec2-api-tools-1.5.2.4

export PATH=$PATH:$EC2_HOME/bin

export EC2_PRIVATE_KEY=~/.ec2/pk-unencrypt-.pem

export EC2_CERT=~/.ec2/cert-.pem

source ec2-init.sh

This will need to be done every login

Alternately, put it in ~/.profile to have it done automatically on login

Test it out

ec2-describe-regions

ec2-describe-images -o self -o amazon

Troubleshooting




Create a new keypair (allows cluster login)

ec2-add-keypair keypair | grep –v KEYPAIR > ~/.ec2/id_rsa-keypair

chmod 600 ~/.ec2/id_rsa--keypair

Only do this once! It will create a new keypair in AWS every time you run it

Share private key file between group members, keep it private

Don’t delete other groups’ keypairs!

Everyone has access to everyone else’s keypairs from the AWS console

EC2 tab ->Network and Security -> Keypairs

Troubleshooting



Setting up Hadoop for EC2


Retrieve hadoop toolswget http://download.nextag.com/apache//hadoop/core/hadoop-1.0.0/hadoop-1.0.0.tar.gz
tar –xzvf hadoop-1.0.0.tar.gzCreate hadoop-ec2 initialization script
vi hadoop-ec2-init.sh (you can use your preferred editor)
export HADOOP_EC2_BIN=~/hadoop-1.0.0/src/contrib/ec2/bin
export PATH=$PATH:$HADOOP_EC2_BIN
source hadoop-ec2-init.sh
This will need to be done every login
Alternately, put it in ~/.profile to have it done automatically on login
Configure hadoop with EC2 account
vi ~/hadoop-1.0.0/src/contrib/ec2/bin/hadoop-ec2-env.sh
AWS_ACCOUNT_ID=283072064258
AWS_ACCESS_KEY_ID=

Looks like AKIAJ5U4QYDDZCNDDY5Q
AWS_SECRET_ACCESS_KEY=

Looks like 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
KEY_NAME=-keypair
The same keypair you set up earlier at ~/.ec1/ida_rsa--keypairCreate/launch cluster
hadoop-ec2 launch-cluster -cluster 2
Can take 10-20 minutes!
Keep an eye on it from the AWS -> EC2 console tab
Note your master node DNS name, you’ll need it later
Looks like: ec2-107-21-112-172.compute-1.amazonaws.com
Test login to master node
hadoop-ec2 login -cluster
Troubleshooting: If you didn’t setup your keypair properly, you’ll get:
[ec2-user@ip-10-123-22-179 ~]$ hadoop-ec2 login test-cluster
Logging in to host ec2-107-21-112-172.compute-1.amazonaws.com.
Warning: Identity file /home/ec2-user/.ec2/id_rsa--keypair not accessible: No such file or directory.
Permission denied (publickey,gssapi-with-mic).
Troubleshooting: http://wiki.apache.org/hadoop/AmazonEC2Running a Map/Reduce Job

Copy the jar file to the master-node
scp i ~/.ec2/id_rsakeypair hadoop1.0.0/hadoop-examples-1.0.0.jar root@:/tmp
Get your master node from the ‘hadoop login -cluster’ command, it will look something like this:
ec2-107-21-112-172.compute-1.amazonaws.com
(Optional) Copy your HDFS files to the master-node
Compress data for faster transfer
tar –cjvf data.bz2
scp i ~/.ec2/id_rsa-keypair data.bz2 root@:/tmp
Upload data to HDFS, HDFS is already setup on the nodes
hadoop fs –put /tmp/

Login to the master node
hadoop login -cluster
Run the Map/Reduce job
hadoop jar /tmp/hadoop-examples-1.0.0.jar pi 10 10000000
Track task process from the web
http://:50030
E.g. http://ec2-107-21-112-172.compute-1.amazonaws.com:50030



Comments (0)

Now that you can see your WordPress blog on your local host, you can publish this website as the default site on your instance so that other people can see it. The next procedure walks you through the process of modifying your WordPress settings to point to the public DNS name of your instance instead of your local host.

HenryBLOGultom » Blog Archive » Setting up Amazon EC2 and using Hadoop
axgbxxvynjc
xgbxxvynjc http://www.g51vvgl9036q9l8o3nu57p8ry8035yeos.org/
[url=http://www.g51vvgl9036q9l8o3nu57p8ry8035yeos.org/]uxgbxxvynjc[/url]

Write a comment