ubuntu 20.04出来有一段时间了,正好最近在学习Hadoop的使用,于是打算在ubuntu 20.04上安装和使用hadoop。
安装Java
Hadoop基于Java,需要先安装Java。Java 8是推荐版本,Java 11也可以使用。
apt update &&apt upgrade
apt install openjdk-11-jdk
创建专门用户并配置ssh
为Hadoop专门创建用户:
sudo useradd -m hadoop -s /bin/bash
sudo passwd hadoop
sudo adduser hadoop sudo
警告:如果你的服务器在公网上,密码设置的复杂一点。
因为hadoop需要使用ssh,因此推荐使用密钥免认证ssh:
su - hadoop
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
下载Hadoop
不要切换用户,继续用新创建的hadoop用户
Hadoop 官网:Apache Hadoop releases list
当前最新版本为3.2.1,下载二进制包:
wget https://downloads.apache.org/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
tar xvzf hadoop-*.tar.gz
mv hadoop-3.2.1 hadoop
配置Hadoop
编辑环境变量
修改~/.bashrc
,最后添加:
export HADOOP_HOME=/home/hadoop/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
生效:
source ~/.bashrc
设置JAVA_HOME
环境变量:
nano ~/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
设置配置文件
设置配置文件:
cd $HADOOP_HOME/etc/hadoop
nano core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
nano hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/namenode</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>file:///home/hadoop/hadoopdata/hdfs/datanode</value>
</property>
</configuration>
nano mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
nano yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
格式化 namenode
cd ~
hdfs namenode -format
启动Hadoop单机伪集群
cd $HADOOP_HOME/sbin
./start-dfs.sh
./start-yarn.sh
查看集群状态
curl 127.0.0.1:9870
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="REFRESH" content="0;url=dfshealth.html" />
<title>Hadoop Administration</title>
</head>
</html>
文档信息
- 本文作者:last2win
- 本文链接:https://last2win.com/ubuntu-20.04-install-hadoop/
- 版权声明:自由转载-非商用-非衍生-保持署名(创意共享3.0许可证)