公司Commerce Cloud平台上提供申请主机的服务。昨天试了下,申请了3台机器,搭了个hadoop环境。以下是机器的一些配置:
emi-centos-6.4-x86_64
medium | 6GB 内存| 2 虚拟内核 | 30.0GB 盘
3个机器的主机和ip规划如下:
IP地址 主机名 用途
192.168.0.101 hd1 namenode
192.168.0.102 hd2 datanode192.168.0.103 hd3 datanode
一、系统设置
(所有步骤都需要在所有节点执行)
1. 修改主机名及ip地址解析
1) 修改主机名
[root@hd1 toughhou]# hostname hd1[root@hd1 toughhou]# cat /etc/sysconfig/networkNETWORKING=yesHOSTNAME=hd1
2) 增加ip和主机映射
[root@hd1 toughhou]# vi /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.0.101 hd1192.168.0.102 hd2192.168.0.103 hd3
3) 验证是否成功
[toughhou@hd1 ~]$ ping hd2PING hd2 (192.168.0.102) 56(84) bytes of data.64 bytes from hd2 (192.168.0.102): icmp_seq=1 ttl=63 time=2.55 ms[toughhou@hd1 ~]$ ping hd3PING hd3 (192.168.0.103) 56(84) bytes of data.64 bytes from hd3 (192.168.0.103): icmp_seq=1 ttl=63 time=2.48 ms
能ping通说明已经OK。
2. 关闭防火墙[root@hd1 toughhou]# chkconfig iptables off
3. SSH免密码登陆
1) 生成密钥与公钥
登陆到hd1,把生成的id_rsa.pub(公钥)内容cat到authorized_keys文件中。同时登陆到hd2, hd3,生成id_rsa.pub,并把hd2, hd3各自的id_rsa.pub的内容copy到hd1中的authorzied_keys中。最后从hd1中scp到hd2, hd3的.ssh目录中。[toughhou@hd1 ~]$ ssh-keygen -t rsa[toughhou@hd1 ~]$ cat id_rsa.pub >> authorized_keys[toughhou@hd2 ~]$ ssh-keygen -t rsa[toughhou@hd2 ~]$ cat id_rsa.pub >> authorized_keys[toughhou@hd3 ~]$ ssh-keygen -t rsa[toughhou@hd3 ~]$ cat id_rsa.pub >> authorized_keys
2) scp authorized_keys到hd2, hd3
[toughhou@hd1 ~]$ scp authorized_keys 192.168.0.102:/home/toughhou/.ssh/[toughhou@hd1 ~]$ scp authorized_keys 192.168.0.103:/home/toughhou/.ssh/
3) 验证ssh登陆是否是免密码
(第一次需要密码,若配置正确的话之后就不用密码了。)
[toughhou@hd1 ~]$ ssh 192.168.0.102[toughhou@hd2 ~]$[toughhou@hd1 ~]$ ssh 192.168.0.103[toughhou@hd3 ~]$
关于SSH免密码登陆,也可以参考文章 “”,它更具体地说了关于SSH设置。
二、安装jdk、hadoop及设置环境变量
1. 下载jdk、hadoop安装包
2. 解压[toughhou@hd1 software]$ tar zxvf jdk-7u65-linux-x64.gz[toughhou@hd1 software]$ tar zxvf hadoop-2.4.0.tar.gz[root@hd1 software]# mv hadoop-2.4.0 /opt/hadoop-2.4.0[root@hd1 software]# mv jdk1.7.0_65 /opt/jdk1.7.0
以root用户登陆编辑/etc/profile,加入以下内容:
[root@hd1 software]# vi /etc/profile#javaexport JAVA_HOME=/opt/jdk1.7.0export JRE_HOME=$JAVA_HOME/jreexport PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=./:$JAVA_HOME/lib:$JAVA_HOME/jre/lib#hadoopexport HADOOP_HOME=/opt/hadoop-2.4.0export HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_YARN_HOME=$HADOOP_HOMEexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/libexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
[toughhou@hd1 hadoop]$ java -version[toughhou@hd1 hadoop]$ hadoopUsage: hadoop [--config confdir] COMMAND
三、hadoop集群设置
1. 修改hadoop配置文件
[toughhou@hd1 hadoop]$ cd /opt/hadoop-2.4.0/etc/hadoop
1) hadoop-env.sh、yarn-env.sh 设置JAVA_HOME环境变量
最开始以为已经在/etc/profile设置了JAVA_HOME,所以在hadoop-env.sh和yarn-env.sh中已经能成功获取到JAVA_HOME,所以就不用再设置了。最终发现这在hadoop-2.4.0中行不通,start-all.sh的时候出错了(hd1: Error: JAVA_HOME is not set and could not be found.)。
找到里面的JAVA_HOME,修改为实际路径
2) slaves这个文件配置所有datanode节点,以便namenode搜索[toughhou@hd1 hadoop]$ vi slaves hd2hd3
3) core-site.xml
fs.defaultFS hdfs://hd1:9000 io.file.buffer.size 131072 hadoop.tmp.dir /hadoop/temp A base for other temporary directories. hadoop.proxyuser.root.hosts hd1 hadoop.proxyuser.root.groups *
4) hdfs-site.xml
dfs.namenode.name.dir /hadoop/name true dfs.datanode.data.dir /hadoop/data true dfs.replication 2 dfs.permissions false
5) mapred-site.xml
fs.defaultFS hdfs://hd1:9000 io.file.buffer.size 131072 hadoop.tmp.dir /hadoop/temp A base for other temporary directories. hadoop.proxyuser.root.hosts hd1 hadoop.proxyuser.root.groups *
6) yarn-site.xml
yarn.resourcemanager.address hd1:18040 yarn.resourcemanager.scheduler.address hd1:18030 yarn.resourcemanager.resource-tracker.address hd1:18025 yarn.resourcemanager.admin.address hd1:18041 yarn.resourcemanager.webapp.address hd1:8088 yarn.nodemanager.local-dirs /hadoop/mynode/my yarn.nodemanager.log-dirs /hadoop/mynode/logs yarn.nodemanager.log.retain-seconds 10800 yarn.nodemanager.remote-app-log-dir /logs yarn.nodemanager.remote-app-log-dir-suffix logs yarn.log-aggregation.retain-seconds -1 yarn.log-aggregation.retain-check-interval-seconds -1 yarn.nodemanager.aux-services mapreduce_shuffle
2. 把以下文件复制到其它节点
[root@hd1 toughhou]# scp -R /opt/hadoop-2.4.0/ hd2:/opt/[root@hd1 toughhou]# scp -R /opt/hadoop-2.4.0/ hd3:/opt/[root@hd1 toughhou]# scp -R /opt/jdk1.7.0/ hd2:/opt/[root@hd1 toughhou]# scp -R /opt/jdk1.7.0/ hd3:/opt/[root@hd1 toughhou]# scp /etc/profile hd2:/etc/profile[root@hd1 toughhou]# scp /etc/profile hd3:/etc/profile[root@hd1 toughhou]# scp /etc/hosts hd2:/etc/hosts[root@hd1 toughhou]# scp /etc/hosts hd3:/etc/hosts
配置完成之后需要重启电脑
3. namenode初始化
只需要第一次的时候初始化,之后就不需要了
[toughhou@hd1 bin]$ hdfs namenode -format
如果“Exiting with status 0”,就说明OK。
14/07/23 03:26:33 INFO util.ExitUtil: Exiting with status 0 4. 启动集群[toughhou@hd1 sbin]$ cd /opt/hadoop-2.4.0/sbin[toughhou@hd1 sbin]$ ./start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [hd1]hd1: namenode running as process 12580. Stop it first.hd2: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd2.outhd3: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd3.outStarting secondary namenodes [0.0.0.0]0.0.0.0: secondarynamenode running as process 12750. Stop it first.starting yarn daemonsresourcemanager running as process 11900. Stop it first.hd3: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd3.outhd2: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd2.out
5. 查看各节点的状态
[toughhou@hd1 sbin]$ jps16358 NameNode16535 SecondaryNameNode16942 Jps16683 ResourceManage[toughhou@hd2 ~]$ jps2253 NodeManager2369 Jps2152 DataNode[toughhou@hd3 ~]$ jps2064 NodeManager2178 Jps1963 DataNode
以上说明都OK。
6. windows添加快捷访问为了方便访问,我们也可以编辑 %systemroot%\system32\drivers\etc\hosts 文件,加入以下的 ip和主机映射
192.168.0.101 hd1192.168.0.102 hd2192.168.0.103 hd3
这样,我们在自己机器上也可以通过 http://hd2:8042/node 方式访问节点,而没必要用 http://192.168.0.102:8042/node。
7. wordcount 测试为了更进一步验证hadoop环境,我们可以运行hadoop自带的例子。
wordcount是hadoop最经典的mapreduce例子。我们进入到相应目录运行自带的jar包,来测试hadoop环境是否OK。
具体步骤:
1) hdfs上创建目录
[toughhou@hd1 ~]$ hadoop fs -mkdir /in/wordcount[toughhou@hd1 ~]$ hadoop fs -mkdir /out/
2) 上传文件到hdfs
[toughhou@hd1 ~]$ cat in1.txtHello World , Hello China, Hello ShanghaiI love ChinaHow are you[toughhou@hd1 ~]$ hadoop fs -put in1.txt /in/wordcount
3) 运行wordcount
[toughhou@hd1 ~]$ cd /opt/hadoop-2.4.0/share/hadoop/mapreduce/[toughhou@hd2 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /in/wordcount /out/out114/07/23 10:42:36 INFO client.RMProxy: Connecting to ResourceManager at hd1/192.168.0.101:1804014/07/23 10:42:38 INFO input.FileInputFormat: Total input paths to process : 214/07/23 10:42:38 INFO mapreduce.JobSubmitter: number of splits:214/07/23 10:42:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406105556378_000314/07/23 10:42:38 INFO impl.YarnClientImpl: Submitted application application_1406105556378_000314/07/23 10:42:38 INFO mapreduce.Job: The url to track the job: http://hd1:8088/proxy/application_1406105556378_0003/14/07/23 10:42:38 INFO mapreduce.Job: Running job: job_1406105556378_000314/07/23 10:42:46 INFO mapreduce.Job: Job job_1406105556378_0003 running in uber mode : false14/07/23 10:42:46 INFO mapreduce.Job: map 0% reduce 0%14/07/23 10:42:55 INFO mapreduce.Job: map 100% reduce 0%14/07/23 10:43:01 INFO mapreduce.Job: map 100% reduce 100%
4) 查看运行结果
[toughhou@hd2 mapreduce]$ hadoop fs -cat /out/out4/part-r-00000, 1China 1China, 1Hello 3How 1I 1Shanghai 1World 1are 1love 1you 1
到此,全部结束。整个hadoop-2.4.0集群搭建过程全部结束。