Hadoop 1 安装配置过程 1.1 安装配置树莓派 (1). 软硬件准备
树莓派
SD 卡格式工具
树莓派官方系统烧录工具
树莓派操作系统,建议选择官方系统
(2). 烧录软件
配置ssh和设置wifi(对应的主机名分别设置为master,slave01,slave02
)
(3). 连接配置网络
打开手机下载软件某热点软件
,可以查看树莓派的ip
然后输入ssh nudt@192.168.225.211
.输入密码ssh
1.2 安装配置jdk (1). 将下载的jdk-8u241-linux-arm64-vfp-hflt.tar.gz
,通过termius
传递到三台树莓派上。
(2). 解压
tar -zxvf jdk-8u241-linux-arm64-vfp-hflt.tar.gz sudo mkdir /usr/lib/jvm/ sudo mv jdk1.8.0_241/ /usr/lib/jvm/
(3). 配置环境变量
配置的文件为/etc/profile
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_241export CLASSPATH=".:$JAVA_HOME /lib:$CLASSPATH " export PATH="$JAVA_HOME /bin:$PATH "
使他生效
(4). 设置系统默认jdk
sudo update-alternatives --install /usr/bin/java java /usr/lib/jvm/jdk1.8.0_241/bin/java 300 sudo update-alternatives --install /usr/bin/javac javac /usr/lib/jvm/jdk1.8.0_241/bin/javac 300 sudo update-alternatives --install /usr/bin/jar jar /usr/lib/jvm/jdk1.8.0_241/bin/jar 300 sudo update-alternatives --install /usr/bin/javah javah /usr/lib/jvm/jdk1.8.0_241/bin/javah 300 sudo update-alternatives --install /usr/bin/javap javap /usr/lib/jvm/jdk1.8.0_241/bin/javap 300 sudo update-alternatives --config java
(5). 验证java安装成功
1.3 安装配置hadoop (1). 下载(只需要在master上做 )
或者使用压缩包
wget --no-check-certificate https://repo.huaweicloud.com/apache/hadoop/common/hadoop-3.3.2/hadoop-3.3.2.tar.gz
可以使用我提供的压缩包。
(2). 解压(只需要在master上做 )
tar -zxvf hadoop-3.3.2.tar.gz mv hadoop-3.3.2 ~/hadoop/
(3). 配置并启动hadoop的环境变量,source /etc/profile
.
sudo vim /etc/profile export HADOOP_HOME=/home/nudt/hadoopexport PATH=$HADOOP_HOME /bin:$HADOOP_HOME /sbin:$PATH export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME /lib/nativeexport HADOOP_OPTS="-Djava.library.path-$HADOOP_HOME /lib" export JAVA_LIBRARY_PATH=$HADOOP_HOME /lib/native:$JAVA_LIBRARY_PATH
(4). 验证hadoop是否安装成功(只需要在master )
$ hadoop Usage: hadoop [OPTIONS] SUBCOMMAND [SUBCOMMAND OPTIONS] or hadoop [OPTIONS] CLASSNAME [CLASSNAME OPTIONS] where CLASSNAME is a user-provided Java class OPTIONS is none or any of: buildpaths attempt to add class files from build tree --config dir Hadoop config directory --debug turn on shell script debug mode --help usage information hostnames list[,of,host,names] hosts to use in slave mode hosts filename list of hosts to use in slave mode loglevel level set the log4j level for this command workers turn on worker mode SUBCOMMAND is one of: Admin Commands: daemonlog get/set the log level for each daemon Client Commands: archive create a Hadoop archive checknative check native Hadoop and compression libraries availability classpath prints the class path needed to get the Hadoop jar and the required libraries conftest validate configuration XML files credential interact with credential providers distch distributed metadata changer distcp copy file or directories recursively dtutil operations related to delegation tokens envvars display computed Hadoop environment variables fs run a generic filesystem user client gridmix submit a mix of synthetic job, modeling a profiled from production load jar <jar> run a jar file. NOTE: please use "yarn jar" to launch YARN applications, not this command. jnipath prints the java.library.path kdiag Diagnose Kerberos Problems kerbname show auth_to_local principal conversion key manage keys via the KeyProvider rumenfolder scale a rumen input trace rumentrace convert logs into a rumen trace s3guard manage metadata on S3 trace view and modify Hadoop tracing settings version print the version Daemon Commands: kms run KMS, the Key Management Server registrydns run the registry DNS server SUBCOMMAND may print help when invoked w/o parameters or with -h.
(5). :star:修改主机名和配置网络映射
主机名在初始化配置的时候我已经要求设置了,只能做重复讲解。sudo vim /etc/hostname
修改sudo vim /etc/hosts
192.168.239.28 master 192.168.239.211 slave01 192.168.239.254 slave02
修改网络映射sudo vim /etc/cloud/templates/hosts.debian.tmpl
(注意!!!只保留下面这些数据,其他ipv4的数据全删除!!!)
## template:jinja {# This file (/etc/cloud/templates/hosts.debian.tmpl) is only utilized if enabled in cloud-config. Specifically, in order to enable it you need to add the following to config: manage_etc_hosts: True -#} # Your system has configured 'manage_etc_hosts' as True. # As a result, if you wish for changes to this file to persist # then you will need to either # a.) make changes to the master file in /etc/cloud/templates/hosts.debian.tmpl # b.) change or remove the value of 'manage_etc_hosts' in # /etc/cloud/cloud.cfg or cloud-config from user-data # {# The value '{{hostname}}' will be replaced with the local-hostname -#} #127.0.1.1 {{fqdn}} {{hostname}} #127.0.0.1 localhost 192.168.239.28 master 192.168.239.211 slave01 192.168.239.254 slave02 # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters
注意这里的网段设置一定要和你ssh连接上去的同步 最后
scp /etc/cloud/templates/hosts.debian.tmpl root@slave01:/etc/cloud/templates/hosts.debian.tmpl scp /etc/cloud/templates/hosts.debian.tmpl root@slave02:/etc/cloud/templates/hosts.debian.tmpl scp /etc/hosts root@slave01:/etc/hosts scp /etc/hosts root@slave02:/etc/hosts
(6). 配置ssh免密登录
ssh-keygen -t rsa(一直回车就行) cd /home/nudt/.ssh cat id_rsa.pub >> authorized_keys ssh-copy-id -i ./id_rsa.pub nudt@slave01(这里就是一台主机对于两外两台) ssh-copy-id -i ./id_rsa.pub nudt@slave02
现在可以开到下面这些文件
这里我们需要将所有的主机之间开通
下面测试远程登陆免密码。
测试成功。
(7).配置hadoop
a). 配置core-site.xml
该配置文件属于 Hadoop 的全局配置文件,我们主要进行配 置分布式文件系统 HDFS 的入口地址(即 NameNode 的地址)和 HDFS 运行时所生产数 据的保存位置
cd /home/nudt/hadoop/etc/hadoopvim core-site.xml mkdir /home/nudt/hadoop_data/mkdir /home/nudt/hadoop_data/tmpmkdir /home/nudt/hadoop_data/dfs/mkdir /home/nudt/hadoop_data/dfs/namemkdir /home/nudt/hadoop_data/dfs/datasudo mkdir /usr/container/logs
将下面的内容修改后粘贴进去
<configuration > <property > <name > fs.defaultFS</name > <value > hdfs://master:9000</value > </property > <property > <name > hadoop.tmp.dir</name > <value > /home/nudt/hadoop_data/tmp</value > </property > </configuration >
参数说明
fs.defaultFS(指定 HDFS 中 NameNode 的地址)
hadoop.tmp.dir(指定 hadoop 运行时产生文件的存储目录)
b). 配置 hdfs-site.xml 文件
<configuration > <property > <name > dfs.namenode.http-address</name > <value > 192.168.239.28:50070</value > </property > <property > <name > dfs.namenode.secondary.http-address</name > <value > 192.168.239.28:50090</value > </property > <property > <name > dfs.replication</name > <value > 3</value > </property > <property > <name > dfs.namenode.name.dir</name > <value > /home/nudt/hadoop_data/dfs/name</value > </property > <property > <name > dfs.datanode.data.dir</name > <value > /home/nudt/hadoop_data/dfs/data</value > </property > </configuration >
这里只有一个参数需要说明,因为我们一共3台机器,配置只有一个master和两个slave,所有secondaryNameNode
也是master.
c) yarn-site.xml
<configuration > <property > <name > yarn.resourcemanager.hostsname</name > <value > master</value > </property > <property > <name > yarn.resourcemanager.webapp.address</name > <value > master:8088</value > </property > <property > <name > yarn.nodemanager.aux-services</name > <value > mapreduce_shuffle</value > </property > <property > <name > yarn.nodemanager.aux-services.mapreduce.shuffle.class</name > <value > org.apache.hadoop.mapred.ShuffleHandler</value > </property > <property > <name > yarn.log-aggregation-enable</name > <value > true</value > </property > <property > <name > yarn.log-aggregation.retain-seconds</name > <value > 106800</value > </property > <property > <name > yarn.nodemanager.remote-app-log-dir</name > <value > /usr/container/logs</value > </property > </configuration >
d) mapred-site.xml
<configuration > <property > <name > mapreduce.framework.name</name > <value > yarn</value > </property > <property > <name > mapreduce.jobhistory.address</name > <value > master:10020</value > </property > <property > <name > mapreduce.jobhistory.webapp.address</name > <value > master:19888</value > </property > <property > <name > mapreduce.jobhistory.intermediate-done-dir</name > <value > ${hadoop.tmp.dir}/mr-history/tmp</value > </property > <property > <name > mapreduce.jobhistory.done-dir</name > <value > ${hadoop.tmp.dir}/mr-history/done</value > </property > </configuration >
e) 配置workes
f)现在开始分发到各个从机器
scp -r /home/nudt/hadoop nudt@slave01:/home/nudt/ scp -r /home/nudt/hadoop nudt@slave02:/home/nudt/
时间比较长,耐心等待….
1.4 验证安装 在启动hadoop集群之前需要先格式化namenode
1)启动和停止 HDFS
启动和停止 Yarn
start-yarn.sh stop-yarn.sh
全部暂停或启动
启动和停止历史(日志)服务器
mr-jobhistory-daemon.sh start historyserver mr-jobhistory-daemon.sh start historyserver
查看jbs
1.5 交换机配置(可选) (1). 配置windows机器的固定ip
右键打开网络设置,选择更改适配器
选择以太网,右键属性
,选择interel桥接协议ipv4
按照如下设置
(2).配置树莓派的三台ip(注意和之前的保持一致)
插入电脑,打开system-boot盘
编辑cmdline.txt
添加图中这一行,重复此步骤,注意 一定要和上面的对应起来,不然你都得全配。然后一切恢复正常。
1.6 问题解决 a. JAVA_HOME没有设置 vim /home/nudt/hadoop/etc/hadoop/hadoop-env.sh (加入这一句)export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_241
b. master: Permission denied 原因不是权限不够,而是未将master所使用的用户的公钥加到相应主机下面
先调整到root用户
sudo passwd(密码设置为root) su root
生成密钥对
写入本机的root的信任
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
发送公钥
首先在slave机器上也执行调整为root过程
,修改/etc/ssh/sshd_config
文件。
sudo vim /etc/ssh/sshd_config
找到PermitRootLogin prohibit-password
.
将这个修改为如下
重启服务service sshd restart
.然后后面操作的时候都用root
用户!。
最后添加配置(root用户)
sudo vim /etc/profile
export HDFS_NAMENODE_USER=rootexport HDFS_DATANODE_USER=rootexport HDFS_SECONDARYNAMENODE_USER=rootexport YARN_RESOURCEMANAGER_USER=rootexport YARN_NODEMANAGER_USER=root
最后加入这个。这里你想用什么用户登录创建,你就用什么用户,不必用root
c. 解决could only be written to 0 of the 1 minReplicati
参考:https://blog.csdn.net/sinat_38737592/article/details/101628357
2. hadoop使用 2.1 hadoop文件基础操作 SHELL 命令 远程访问的时候增加参数
-fs <file:///|hdfs://namenode:port> specify default filesystem URL to use, overrides 'fs.defaultFS' property from configurations. -fs hdfs://master:9000
创建文件
hadoop fs -touch /tmp/exp.tx
写入文件
echo "<Text to append>" | hadoop fs -appendToFile - /aaa/aa.txt hadoop fs -appendToFile {src} {dst}
删除文件
下载文件
[-get [-f] [-p] [-crc] [-ignoreCrc] [-t <thread count>] [-q <thread pool queue size>] <src> ... <localdst>] hadoop fs -get <src> <localdst> 最重要-t 可以设置进程
重命名/移动
复制
查看详细信息
配置权限(!!必须做!!)
更多查看 hadoop fs --help
java代码操作 import org.apache.hadoop.io.IOUtils;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.LocatedFileStatus;import org.apache.hadoop.fs.Path;import org.apache.hadoop.fs.RemoteIterator;import org.apache.hadoop.util.Progressable;import org.apache.kerby.util.PublicKeyDeriver;import java.io.BufferedReader;import java.io.IOException;import java.io.InputStream;import java.io.InputStreamReader;import java.io.*;import java.net.URI;import java.net.URISyntaxException;import java.nio.charset.StandardCharsets;public class FileManager { public static String nameNode = "192.168.239.28:9000" ; public static URI hdfsHost; static { try { hdfsHost = new URI ("hdfs://192.168.239.28:9000" ); } catch (URISyntaxException e) { e.printStackTrace(); } } public static void createHelloWorld (Configuration cf,String filePath) throws IOException, URISyntaxException { FileSystem fs = FileSystem.get(hdfsHost,cf); byte [] buff = "Hello World" .getBytes(StandardCharsets.UTF_8); if (!fs.exists(new Path (filePath))){ FSDataOutputStream fos = fs.create(new Path (filePath)); fos.write(buff,0 ,buff.length); System.out.println("Create a new File:" + filePath +" with HelloWord" ); fos.close(); }else { System.out.println("Will Overwrti file:\t" + filePath); System.out.println("Add contents to :\t" + filePath +" with HelloWord" ); FSDataOutputStream fos = fs.create(new Path (filePath),true ); fos.write(buff,0 ,buff.length); fos.close(); } fs.close(); } public static void fileExist (Configuration cf, String filePath) throws IOException{ FileSystem fs = FileSystem.get(hdfsHost,cf); if (fs.exists(new Path (filePath))){ System.out.println(filePath + "\tExists!" ); }else { System.out.println(filePath + "\tNot Exists!" ); } } public static void readFile (Configuration cf,String filePath) throws IOException { FileSystem fs = FileSystem.get(hdfsHost,cf); FSDataInputStream open = fs.open(new Path (filePath)); BufferedReader bfr = new BufferedReader (new InputStreamReader (open)); System.out.println("Begin Read:" + filePath); String contentLine = bfr.readLine(); while (contentLine != null ) { System.out.println(contentLine); contentLine = bfr.readLine(); } } public static void delteFile (Configuration cf,String filePath) throws IOException{ FileSystem fs = FileSystem.get(hdfsHost,cf); if (fs.delete(new Path (filePath), false )){ System.out.println(filePath + "\tdelete success!" ); }else { System.out.println(filePath + "\tdelete fail!" ); } } public static void showDir (Configuration cf , String filePath) throws IOException{ FileSystem fs = FileSystem.get(hdfsHost,cf); RemoteIterator<LocatedFileStatus> listFiles = fs.listFiles(new Path (filePath), false ); while (listFiles.hasNext()) { LocatedFileStatus next = listFiles.next(); System.out.println(next.getPath().getName()); } FileStatus[] listStatus = fs.listStatus(new Path (filePath)); for (FileStatus list:listStatus) { System.out.println(list.getPath().getName()); } } public static void uploadFile (Configuration cf, String localstr, String dst) throws Exception{ FileSystem fs = FileSystem.get(hdfsHost,cf); InputStream in = new FileInputStream (localstr); OutputStream out = fs.create(new Path (dst), new Progressable () { @Override public void progress () { System.out.println("上传完一个设定缓存区大小容量的文件!" ); } }); IOUtils.copyBytes(in, out, cf); System.out.println("LocalFile:\t" + localstr+"\tupload to:" + dst); } public static void downloadFile (Configuration cf, String remoteStr, String localString) throws Exception{ FileSystem fs = FileSystem.get(hdfsHost,cf); InputStream in = fs.open(new Path (remoteStr)); OutputStream out = new FileOutputStream (localString); IOUtils.copyBytes(in, out, cf); System.out.println("downloadFile:\t" +remoteStr +"to " + localString); } public static void main (String[] args) throws Exception { Configuration cf = new Configuration (); String path = "/tmp/dem0.txt" ; cf.set("ds.defaultFs" ,"hdfs://" +nameNode); System.out.println("[*]createHelloWorld:" ); createHelloWorld(cf,path); System.out.println("[*]showDir:" ); showDir(cf, "/tmp" ); System.out.println("[*]fileExist:" ); fileExist(cf, path); System.out.println("[*]readFile:" ); readFile(cf, path); System.out.println("[*]delteFile:" ); delteFile(cf, path); System.out.println("[*]showDir:" ); showDir(cf, "/" ); System.out.println("[*]uploadFile:" ); uploadFile(cf, "/etc/passwd" , "/tmp/passwd" ); } }
2.2 vcode+maven+hadoop开发环境配置 基于iotdevelop
环境
安装vcode
wget https://vscode.cdn.azure.cn/stable/4af164ea3a06f701fe3e89a2bcbb421d2026b68f/code_1.68.0-1654690107_amd64.deb?1 -o code.deb sudo dpkg -i ./code.deb
安装maven
sudo apt-get install maven export M2_HOME=/usr/share/maven(这一句加入/etc/profile)
配置maven的阿里源
详情参照参考资料
sudo vim /usr/share/maven/conf/settings.xml
配置vcode
下载插件Java Extension Pack
.
开始配置
新建项目
在空白区域右键
剩下两个选项输入自己想要输入的内容。点击
选择你要存放的目录。然后需要等待。
直接回车就可以了。然后导入依赖。
<dependency > <groupId > org.apache.hadoop</groupId > <artifactId > hadoop-common</artifactId > <version > ${hadoop.version}</version > </dependency > <dependency > <groupId > org.apache.hadoop</groupId > <artifactId > hadoop-hdfs</artifactId > <version > ${hadoop.version}</version > </dependency > <dependency > <groupId > org.apache.hadoop</groupId > <artifactId > hadoop-mapreduce-client-core</artifactId > <version > ${hadoop.version}</version > </dependency > <dependency > <groupId > org.apache.hadoop</groupId > <artifactId > hadoop-client</artifactId > <version > ${hadoop.version}</version > </dependency > <dependency > <groupId > org.apache.hadoop</groupId > <artifactId > hadoop-yarn-api</artifactId > <version > ${hadoop.version}</version > </dependency >
注意所处的位置!!然后等待就行了。加载完毕之后就可以写代码了。
参考资料
1.hdfs命令行操作:https://zhuanlan.zhihu.com/p/271098213
2.hdfs代码操作:https://blog.csdn.net/little_sloth/article/details/107040607
3.vcode+maven+hadoop开发:https://www.cnblogs.com/orion-orion/p/15664772.html
4.ubuntu安装maven:https://cloud.tencent.com/developer/article/1649751
5.java权限:https://blog.csdn.net/qq_43541746/article/details/115422142
6.mapreduce入门:https://www.runoob.com/w3cnote/mapreduce-coding.html