CentOS6.4上搭建hadoop-2.4.0集群-白红宇

CentOS6.4上搭建hadoop-2.4.0集群

阅读量：6474 次

发布时间：2019-06-23

本文共 10279 字，大约阅读时间需要 34 分钟。

公司Commerce Cloud平台上提供申请主机的服务。昨天试了下，申请了3台机器，搭了个hadoop环境。以下是机器的一些配置：

emi-centos-6.4-x86_64

medium | 6GB 内存| 2 虚拟内核 | 30.0GB 盘

3个机器的主机和ip规划如下：

IP地址主机名用途

192.168.0.101 hd1 namenode

192.168.0.102 hd2 datanode

192.168.0.103 hd3 datanode

一、系统设置

（所有步骤都需要在所有节点执行）

1. 修改主机名及ip地址解析

1) 修改主机名

[root@hd1 toughhou]# hostname hd1[root@hd1 toughhou]# cat /etc/sysconfig/networkNETWORKING=yesHOSTNAME=hd1

2) 增加ip和主机映射

[root@hd1 toughhou]# vi /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain6192.168.0.101 hd1192.168.0.102 hd2192.168.0.103 hd3

3) 验证是否成功

[toughhou@hd1 ~]$ ping hd2PING hd2 (192.168.0.102) 56(84) bytes of data.64 bytes from hd2 (192.168.0.102): icmp_seq=1 ttl=63 time=2.55 ms[toughhou@hd1 ~]$ ping hd3PING hd3 (192.168.0.103) 56(84) bytes of data.64 bytes from hd3 (192.168.0.103): icmp_seq=1 ttl=63 time=2.48 ms

能ping通说明已经OK。

2. 关闭防火墙

[root@hd1 toughhou]# chkconfig iptables off

3. SSH免密码登陆

1) 生成密钥与公钥

登陆到hd1，把生成的id_rsa.pub（公钥）内容cat到authorized_keys文件中。同时登陆到hd2, hd3，生成id_rsa.pub，并把hd2, hd3各自的id_rsa.pub的内容copy到hd1中的authorzied_keys中。最后从hd1中scp到hd2, hd3的.ssh目录中。

[toughhou@hd1 ~]$ ssh-keygen -t rsa[toughhou@hd1 ~]$ cat id_rsa.pub >> authorized_keys[toughhou@hd2 ~]$ ssh-keygen -t rsa[toughhou@hd2 ~]$ cat id_rsa.pub >> authorized_keys[toughhou@hd3 ~]$ ssh-keygen -t rsa[toughhou@hd3 ~]$ cat id_rsa.pub >> authorized_keys

2) scp authorized_keys到hd2, hd3

[toughhou@hd1 ~]$ scp authorized_keys 192.168.0.102:/home/toughhou/.ssh/[toughhou@hd1 ~]$ scp authorized_keys 192.168.0.103:/home/toughhou/.ssh/

3) 验证ssh登陆是否是免密码

（第一次需要密码，若配置正确的话之后就不用密码了。）

[toughhou@hd1 ~]$ ssh 192.168.0.102[toughhou@hd2 ~]$[toughhou@hd1 ~]$ ssh 192.168.0.103[toughhou@hd3 ~]$

关于SSH免密码登陆，也可以参考文章 “”，它更具体地说了关于SSH设置。

二、安装jdk、hadoop及设置环境变量

1. 下载jdk、hadoop安装包

2. 解压

[toughhou@hd1 software]$ tar zxvf jdk-7u65-linux-x64.gz[toughhou@hd1 software]$ tar zxvf hadoop-2.4.0.tar.gz[root@hd1 software]# mv hadoop-2.4.0 /opt/hadoop-2.4.0[root@hd1 software]# mv jdk1.7.0_65    /opt/jdk1.7.0

3. 设置Java环境变量

以root用户登陆编辑/etc/profile，加入以下内容：

[root@hd1 software]# vi /etc/profile#javaexport JAVA_HOME=/opt/jdk1.7.0export JRE_HOME=$JAVA_HOME/jreexport PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=./:$JAVA_HOME/lib:$JAVA_HOME/jre/lib#hadoopexport HADOOP_HOME=/opt/hadoop-2.4.0export HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_YARN_HOME=$HADOOP_HOMEexport HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopexport PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/libexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native

4. 验证环境变量

[toughhou@hd1 hadoop]$ java -version[toughhou@hd1 hadoop]$ hadoopUsage: hadoop [--config confdir] COMMAND

三、hadoop集群设置

1. 修改hadoop配置文件

[toughhou@hd1 hadoop]$ cd /opt/hadoop-2.4.0/etc/hadoop

1) hadoop-env.sh、yarn-env.sh 设置JAVA_HOME环境变量

最开始以为已经在/etc/profile设置了JAVA_HOME，所以在hadoop-env.sh和yarn-env.sh中已经能成功获取到JAVA_HOME，所以就不用再设置了。最终发现这在hadoop-2.4.0中行不通，start-all.sh的时候出错了（hd1: Error: JAVA_HOME is not set and could not be found.）。

找到里面的JAVA_HOME,修改为实际路径

2) slaves

这个文件配置所有datanode节点，以便namenode搜索

[toughhou@hd1 hadoop]$ vi slaves hd2hd3

3) core-site.xml


          
               
        
         fs.defaultFS
                
        
         hdfs://hd1:9000
            
           
               
        
         io.file.buffer.size
                
        
         131072
            
           
               
        
         hadoop.tmp.dir
                
        
         /hadoop/temp
                
        
         A base for other temporary directories.
            
           
               
        
         hadoop.proxyuser.root.hosts
                
        
         hd1
            
           
               
        
         hadoop.proxyuser.root.groups
                
        
         *

View Code

4) hdfs-site.xml


          
               
        
         dfs.namenode.name.dir
                
        
         /hadoop/name
            
        
         true
            
           
               
        
         dfs.datanode.data.dir
                
        
         /hadoop/data
            
        
         true
            
           
               
        
         dfs.replication
            
        
         2
            
           
               
        
         dfs.permissions
                
        
         false

View Code

5) mapred-site.xml


          
               
        
         fs.defaultFS
                
        
         hdfs://hd1:9000
            
           
               
        
         io.file.buffer.size
                
        
         131072
            
           
               
        
         hadoop.tmp.dir
                
        
         /hadoop/temp
                
        
         A base for other temporary directories.
            
           
               
        
         hadoop.proxyuser.root.hosts
                
        
         hd1
            
           
               
        
         hadoop.proxyuser.root.groups
                
        
         *

View Code

6) yarn-site.xml


      
       
        
         yarn.resourcemanager.address
        
        
         hd1:18040
        
       
       
        
         yarn.resourcemanager.scheduler.address
        
        
         hd1:18030
        
       
       
        
         yarn.resourcemanager.resource-tracker.address
        
        
         hd1:18025
        
       
       
        
         yarn.resourcemanager.admin.address
        
        
         hd1:18041
        
       
       
        
         yarn.resourcemanager.webapp.address
        
        
         hd1:8088
        
       
       
        
         yarn.nodemanager.local-dirs
        
        
         /hadoop/mynode/my
        
       
       
        
         yarn.nodemanager.log-dirs
        
        
         /hadoop/mynode/logs
        
       
       
        
         yarn.nodemanager.log.retain-seconds
        
        
         10800
        
       
       
        
         yarn.nodemanager.remote-app-log-dir
        
        
         /logs
        
       
       
        
         yarn.nodemanager.remote-app-log-dir-suffix
        
        
         logs
        
       
       
        
         yarn.log-aggregation.retain-seconds
        
        
         -1
        
       
       
        
         yarn.log-aggregation.retain-check-interval-seconds
        
        
         -1
        
       
       
        
         yarn.nodemanager.aux-services
        
        
         mapreduce_shuffle

View Code

2. 把以下文件复制到其它节点

[root@hd1 toughhou]# scp -R /opt/hadoop-2.4.0/ hd2:/opt/[root@hd1 toughhou]# scp -R /opt/hadoop-2.4.0/ hd3:/opt/[root@hd1 toughhou]# scp -R /opt/jdk1.7.0/ hd2:/opt/[root@hd1 toughhou]# scp -R /opt/jdk1.7.0/ hd3:/opt/[root@hd1 toughhou]# scp /etc/profile hd2:/etc/profile[root@hd1 toughhou]# scp /etc/profile hd3:/etc/profile[root@hd1 toughhou]# scp /etc/hosts hd2:/etc/hosts[root@hd1 toughhou]# scp /etc/hosts hd3:/etc/hosts

配置完成之后需要重启电脑

3. namenode初始化

只需要第一次的时候初始化，之后就不需要了

[toughhou@hd1 bin]$ hdfs namenode -format

如果“Exiting with status 0”，就说明OK。

14/07/23 03:26:33 INFO util.ExitUtil: Exiting with status 0

4. 启动集群

[toughhou@hd1 sbin]$ cd /opt/hadoop-2.4.0/sbin[toughhou@hd1 sbin]$ ./start-all.sh This script is Deprecated. Instead use start-dfs.sh and start-yarn.shStarting namenodes on [hd1]hd1: namenode running as process 12580. Stop it first.hd2: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd2.outhd3: starting datanode, logging to /opt/hadoop-2.4.0/logs/hadoop-toughhou-datanode-hd3.outStarting secondary namenodes [0.0.0.0]0.0.0.0: secondarynamenode running as process 12750. Stop it first.starting yarn daemonsresourcemanager running as process 11900. Stop it first.hd3: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd3.outhd2: starting nodemanager, logging to /opt/hadoop-2.4.0/logs/yarn-toughhou-nodemanager-hd2.out

5. 查看各节点的状态

[toughhou@hd1 sbin]$ jps16358 NameNode16535 SecondaryNameNode16942 Jps16683 ResourceManage[toughhou@hd2 ~]$ jps2253 NodeManager2369 Jps2152 DataNode[toughhou@hd3 ~]$ jps2064 NodeManager2178 Jps1963 DataNode

以上说明都OK。

6. windows添加快捷访问

为了方便访问，我们也可以编辑 %systemroot%\system32\drivers\etc\hosts 文件，加入以下的 ip和主机映射

192.168.0.101 hd1192.168.0.102 hd2192.168.0.103 hd3

这样，我们在自己机器上也可以通过 http://hd2:8042/node 方式访问节点，而没必要用 http://192.168.0.102:8042/node。

7. wordcount 测试

为了更进一步验证hadoop环境，我们可以运行hadoop自带的例子。

wordcount是hadoop最经典的mapreduce例子。我们进入到相应目录运行自带的jar包，来测试hadoop环境是否OK。

具体步骤：

1) hdfs上创建目录

[toughhou@hd1 ~]$ hadoop fs -mkdir /in/wordcount[toughhou@hd1 ~]$ hadoop fs -mkdir /out/

2) 上传文件到hdfs

[toughhou@hd1 ~]$ cat in1.txtHello World , Hello China, Hello ShanghaiI love ChinaHow are you[toughhou@hd1 ~]$ hadoop fs -put in1.txt /in/wordcount

3) 运行wordcount

[toughhou@hd1 ~]$ cd /opt/hadoop-2.4.0/share/hadoop/mapreduce/[toughhou@hd2 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.4.0.jar wordcount /in/wordcount /out/out114/07/23 10:42:36 INFO client.RMProxy: Connecting to ResourceManager at hd1/192.168.0.101:1804014/07/23 10:42:38 INFO input.FileInputFormat: Total input paths to process : 214/07/23 10:42:38 INFO mapreduce.JobSubmitter: number of splits:214/07/23 10:42:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406105556378_000314/07/23 10:42:38 INFO impl.YarnClientImpl: Submitted application application_1406105556378_000314/07/23 10:42:38 INFO mapreduce.Job: The url to track the job: http://hd1:8088/proxy/application_1406105556378_0003/14/07/23 10:42:38 INFO mapreduce.Job: Running job: job_1406105556378_000314/07/23 10:42:46 INFO mapreduce.Job: Job job_1406105556378_0003 running in uber mode : false14/07/23 10:42:46 INFO mapreduce.Job: map 0% reduce 0%14/07/23 10:42:55 INFO mapreduce.Job: map 100% reduce 0%14/07/23 10:43:01 INFO mapreduce.Job: map 100% reduce 100%

4) 查看运行结果

[toughhou@hd2 mapreduce]$ hadoop fs -cat /out/out4/part-r-00000, 1China 1China, 1Hello 3How 1I 1Shanghai 1World 1are 1love 1you 1

到此，全部结束。整个hadoop-2.4.0集群搭建过程全部结束。

转载于:https://www.cnblogs.com/toughhou/p/3864170.html

你可能感兴趣的文章

sorry，you must have a tty to run sudo

查看>>

ios开发中使用正则表达式识别处理字符串中的URL

oracle11g dataguard 安装手册(转)

查看>>

java并发包分析之———Deque和LinkedBlockingDeque

查看>>

1. Two Sum - Easy - Leetcode解题报告

查看>>

SQLiteHelper

查看>>

多线程---同步函数的锁是this（转载）

查看>>

鱼C记事本V1.0（下）- 零基础入门学习Delphi28

React native android 最常见的10个问题

[pat]1045 Favorite Color Stripe

查看>>