1、环境介绍

系统版本:Linux 4.19.90-17.ky10.aarch64 aarch64

Docker:18.09.0

Kubernetes:1.19.0

2、部署材料准备

基础镜像:centos:centos7.9.2009 (arm64/v8)

JDK:jdk1.8 (OpenJDK8U-jdk_aarch64_linux_hotspot_8u275b01.tar.gz)

Hadoop:hadoop-2.9.1.tar.gz

3、制作镜像

由于Hadoop并没有官方指定的镜像版本,而社区镜像使用介绍较少,故在此笔者自制镜像以供后续使用。

3.1. 编辑Dockerfile

vi Dockerfile

FROM centos:centos7.9.2009

 

ADD OpenJDK8U-jdk_aarch64_linux_hotspot_8u275b01.tar.gz /opt

ADD hadoop-2.9.1.tar.gz /opt

 

RUN yum install -y which

 

COPY bootstrap.sh /root/bootstrap/

RUN chmod 777 /root/bootstrap/bootstrap.sh

 

ENV JAVA_HOME /opt/jdk8u275-b01

ENV HADOOP_HOME /opt/hadoop-2.9.1

ENV PATH $JAVA_HOME/bin:$PATH

 

注:其中yum install -y which一句需要联网执行,在离线环境会出现执行失败或连接超时的情况,可通过设置代理解决此类情况。(将RUN yum install -y which改为以下部分即可)

ENV https_proxy=http://172.15.1.114:3128 \

        http_proxy=http://172.15.1.114:3128 \

       HTTP_PROXY=http://172.15.1.114:3128 \

       HTTPS_PROXY=http://172.15.1.114:3128 \

       no_proxy="localhost,localdomain,127.0.0.1,172.15.1.0/24" \

       NO_PROXY="localhost,localdomain,127.0.0.1,172.15.1.0/24"

 

RUN yum install -y which

 

ENV https_proxy="" \

        http_proxy="" \

        HTTP_PROXY="" \

        HTTPS_PROXY="" \

        no_proxy="" \

        NO_PROXY=""

 

3.2.编写启动脚本

vi bootstrap.sh

 

#!/bin/bash

 

cd /root/config

 

# Don't override slaves、core-site.xml and yarn-site.xml

rm -f $HADOOP_HOME/etc/hadoop/slaves $HADOOP_HOME/etc/hadoop/core-site.xml $HADOOP_HOME/etc/hadoop/yarn-site.xml

 

# Copy original hadoop file to $HADOOP_CONF_DIR

cp -a $HADOOP_HOME/etc/hadoop/* $HADOOP_CONF_DIR

 

# Get this node's FQDN

FQDN=`ping $HOSTNAME -c 1 | grep PING | awk '{print $2}'`

 

# If this node is nameNode, set it's FQDN to core-site.xml file and yarn-site.xml file

if [[ "$NODE_TYPE" =~ "NN" ]]; then

  # Apply custom config file context

  for cfg in ./*; do

if [[ ! "$cfg" =~ bootstrap.sh ]]; then

 cat $cfg > $HADOOP_CONF_DIR/${cfg##*/}

fi

  done

 

  # Set nameNode's FQDN to file

  echo $FQDN > $HADOOP_CONF_DIR/NameNode

 

  # Replace nameNode's FQDN

  sed -i 's/${NAME_NODE_FQDN}/'$FQDN'/g' `grep '${NAME_NODE_FQDN}' -rl $HADOOP_CONF_DIR`

 

  # Format HDFS if not formatted yet

  if [[ ! -e $HADOOP_CONF_DIR/hdfs-namenode-format.out ]]; then

$HADOOP_HOME/bin/hdfs namenode -format -force -nonInteractive &> $HADOOP_CONF_DIR/hdfs-namenode-format.out

$HADOOP_HOME/bin/hdfs namenode -format -force -nonInteractive &> $HADOOP_CONF_DIR/hdfs-namenode-format.out

  fi

 

  # Start hadoop nameNode daemon

  $HADOOP_HOME/sbin/hadoop-daemon.sh start namenode

fi

 

# If this node is ResourceManager

if [[ "$NODE_TYPE" =~ "RM" ]]; then

  $HADOOP_HOME/sbin/yarn-daemon.sh start resourcemanager

fi

 

# If this node is nodeManager, add it to slave

if [[ "$NODE_TYPE" =~ "NM" ]]; then

  sed -i '/'$FQDN'/d' $HADOOP_CONF_DIR/slaves

  echo $FQDN >> $HADOOP_CONF_DIR/slaves

 

  # Waiting nameNode set NAME_NODE_FQDN

  while [[ ! -e $HADOOP_CONF_DIR/NameNode || -z $NAME_NODE_FQDN ]]; do

echo "Waiting for nameNode set NAME_NODE_FQDN" && sleep 2 && NAME_NODE_FQDN=`cat $HADOOP_CONF_DIR/NameNode`

  done

 

  # Start hadoop nodeManager daemon

  while [[ -z `curl -sf http://$NAME_NODE_FQDN:8088/ws/v1/cluster/info` ]]; do

echo "Waiting for $FQDN" && sleep 2

  done

  $HADOOP_HOME/sbin/yarn-daemon.sh start nodemanager

fi

 

# If this node is dataNode, add it to slave

if [[ "$NODE_TYPE" =~ "DN" ]]; then

  sed -i '/'$FQDN'/d' $HADOOP_CONF_DIR/slaves

  echo $FQDN >> $HADOOP_CONF_DIR/slaves

 

  # Waiting nameNode set NAME_NODE_FQDN

  while [[ ! -e $HADOOP_CONF_DIR/NameNode || -z $NAME_NODE_FQDN ]]; do

echo "Waiting for nameNode set NAME_NODE_FQDN" && sleep 2 && NAME_NODE_FQDN=`cat $HADOOP_CONF_DIR/NameNode`

  done

 

  # Start hadoop dataNode daemon

  while [[ -z `curl -sf http://$NAME_NODE_FQDN:50070` ]]; do

echo "Waiting for $NAME_NODE_FQDN" && sleep 2

  done

  $HADOOP_HOME/sbin/hadoop-daemon.sh start datanode

fi

 

# keep running

sleep infinity

3.3.构建镜像

docker build -t self_hadoop:2.9.1 .

3.4.将自制镜像打上标签并上传到docker私服

docker tag self_hadoop:2.9.1 172.15.1.6:5000/hadoop:2.9.1

docker push 172.15.1.6:5000/hadoop:2.9.1

4、搭建NFS

因为Hadoop需要在各个节点之间共享,而且各个节点都需要能编辑文件内容,所以需要支持ReadWriteMany的PersistentVolume。这里可以选择使用NFS,但由于笔者是仅单节点试用,故选择使用本地目录作为共享PV。若需要多节点搭建则可自行搜索部署。

5、创建PV和PVC

使用pvc来存放配置文件,hadoop-config-pvc将挂载到所有的Hadoop集群中的POD中。

vi hadoop-pv.yaml

 

apiVersion: v1

kind: PersistentVolume

metadata:

  name: hadoop-config-pv

  labels:

    release: hadoop-config

spec:

  capacity:

    storage: 256Mi

  accessModes:

    - ReadWriteMany

  persistentVolumeReclaimPolicy: Retain

  hostPath:

    path: "/home/data/pv/hadoop/hadoop-config"

---

apiVersion: v1

kind: PersistentVolumeClaim

metadata:

  name: hadoop-config-pvc

spec:

  accessModes:

    - ReadWriteMany

  resources:

    requests:

      storage: 256Mi

  selector:

    matchLabels:

      release: hadoop-config

 

6、修改hadoop必要配置文件,并将其挂载到共享pvc中

vi hadoop-configmap.yaml

 

apiVersion: v1

kind: ConfigMap

metadata:

  name: hadoop-custom-config-cm

  labels:

    app: hadoop

data:

  hdfs-site.xml: |-

    <?xml version="1.0" encoding="UTF-8"?>

    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!--

      Licensed under the Apache License, Version 2.0 (the "License");

      you may not use this file except in compliance with the License.

      You may obtain a copy of the License at

 

        http://www.apache.org/licenses/LICENSE-2.0

 

      Unless required by applicable law or agreed to in writing, software

      distributed under the License is distributed on an "AS IS" BASIS,

      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

      See the License for the specific language governing permissions and

      limitations under the License. See accompanying LICENSE file.

    -->

 

    <!-- Put site-specific property overrides in this file. -->

 

    <configuration>

      <property>

        <name>dfs.name.dir</name>

        <value>/root/hadoop/dfs/name</value>

      </property>

      <property>

        <name>dfs.data.dir</name>

        <value>/root/hadoop/dfs/data</value>

      </property>

      <property>

        <name>dfs.replication</name>

        <value>3</value>

      </property>

      <property>

        <name>dfs.rpc-bind-host</name>

        <value>0.0.0.0</value>

      </property>

      <property>

        <name>dfs.servicerpc-bind-host</name>

        <value>0.0.0.0</value>

      </property>

    </configuration>

  core-site.xml: |-

    <?xml version="1.0" encoding="UTF-8"?>

    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!--

      Licensed under the Apache License, Version 2.0 (the "License");

      you may not use this file except in compliance with the License.

      You may obtain a copy of the License at

 

        http://www.apache.org/licenses/LICENSE-2.0

 

      Unless required by applicable law or agreed to in writing, software

      distributed under the License is distributed on an "AS IS" BASIS,

      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

      See the License for the specific language governing permissions and

      limitations under the License. See accompanying LICENSE file.

    -->

 

    <!-- Put site-specific property overrides in this file. -->

 

    <configuration>

      <property>

        <name>fs.defaultFS</name>

        <value>hdfs://${NAME_NODE_FQDN}:9000</value>

      </property>

      <property>

        <name>hadoop.tmp.dir</name>

        <value>/root/hadoop/tmp</value>

      </property>

    </configuration>

  mapred-site.xml: |-

    <?xml version="1.0"?>

    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

    <!--

      Licensed under the Apache License, Version 2.0 (the "License");

      you may not use this file except in compliance with the License.

      You may obtain a copy of the License at

 

        http://www.apache.org/licenses/LICENSE-2.0

 

      Unless required by applicable law or agreed to in writing, software

      distributed under the License is distributed on an "AS IS" BASIS,

      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

      See the License for the specific language governing permissions and

      limitations under the License. See accompanying LICENSE file.

    -->

 

    <!-- Put site-specific property overrides in this file. -->

 

    <configuration>

      <property>

        <name>mapreduce.framework.name</name>

        <value>yarn</value>

      </property>

    </configuration>

  yarn-site.xml: |-

    <?xml version="1.0"?>

    <!--

      Licensed under the Apache License, Version 2.0 (the "License");

      you may not use this file except in compliance with the License.

      You may obtain a copy of the License at

 

        http://www.apache.org/licenses/LICENSE-2.0

 

      Unless required by applicable law or agreed to in writing, software

      distributed under the License is distributed on an "AS IS" BASIS,

      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

      See the License for the specific language governing permissions and

      limitations under the License. See accompanying LICENSE file.

    -->

    <configuration>

 

    <!-- Site specific YARN configuration properties -->

      <property>

        <name>yarn.resourcemanager.hostname</name>

        <value>${NAME_NODE_FQDN}</value>

      </property>

      <property>

        <name>yarn.resourcemanager.bind-host</name>

        <value>0.0.0.0</value>

      </property>

      <property>

        <name>yarn.nodemanager.bind-host</name>

        <value>0.0.0.0</value>

      </property>

      <property>

        <name>yarn.timeline-service.bind-host</name>

        <value>0.0.0.0</value>

      </property>

      <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

      </property>

      <property>

        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

        <value>org.apache.hadoop.mapred.ShuffleHandler</value>

      </property>

      <property>

        <name>yarn.nodemanager.vmem-check-enabled</name>

        <value>false</value>

      </property>

 

    </configuration>

 

7、创建Namenode

vi hadoop-namenode.yaml

 

apiVersion: v1

kind: Service

metadata:

  name: hadoop-nn-service

  labels:

    app: hadoop-nn

spec:

  ports:

    - port: 9000

      name: hdfs

    - port: 50070

      name: name-node

  clusterIP: None

  selector:

    app: hadoop-nn

---

apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: hadoop-nn

spec:

  replicas: 1

  revisionHistoryLimit: 10

  selector:

    matchLabels:

      app: hadoop-nn

  serviceName: hadoop-nn-service

  template:

    metadata:

      labels:

        app: hadoop-nn

    spec:

      containers:

        - name: hadoop-nn

          image: 172.15.1.6:5000/hadoop:2.9.1

          command: ["bash", "-c", "/root/bootstrap/bootstrap.sh"]

          securityContext:

            privileged: true

          env:

            - name: HADOOP_CONF_DIR

              value: /etc/hadoop

            - name: NODE_TYPE

              value: NN,RM

          volumeMounts:

            - name: hadoop-config-volume

              mountPath: /etc/hadoop

            - name: hadoop-custom-config-volume

              mountPath: /root/config

            - name: dfs-name-dir-volume

              mountPath: /root/hadoop/dfs/name

            - name: dfs-data-dir-volume

              mountPath: /root/hadoop/dfs/data

            - name: hadoop-tmp-dir-volume

              mountPath: /root/hadoop/tmp

      volumes:

        - name: hadoop-config-volume

          persistentVolumeClaim:

            claimName: hadoop-config-pvc

        - name: hadoop-custom-config-volume

          configMap:

            name: hadoop-custom-config-cm

        - name: dfs-name-dir-volume

          emptyDir: {}

        - name: dfs-data-dir-volume

          emptyDir: {}

        - name: hadoop-tmp-dir-volume

          emptyDir: {}

 

注:

1、挂载共享目录hadoop-config-pvc

2、使用StatefulSet进行部署,StatefulSet使用headless-services确保POD基本稳定的网络ID

3、在构建镜像时未制定启动命令,这里使用command来启动bootstrap.sh。

4、通过环境变量设置NODE_TYPE,目前NameNode和ResourceManager运行同一个容器内,这里没运行SecondaryNameNode

5、数据存储可以通过挂载磁盘或存储卷实现,是同hostPath来映射存储,这里笔者先简单使用emptyDir来代替。

6、因为只在第一次启动集群时namenode需要执行格式化,所以在执行完格式化之后,将格式化命令的输出内容存放到共享存储中,以此判断是否已经格式化。

7、共享存储(PVC)中的文件不会在你删除部署的时候自动删除,所以重新部署前请自行判断是否要删除PVC目录的配置文件(特别是hdfs-namenode-format.out文件)

 

8、创建Datanode

vi hadoop-datanode.yaml

 

apiVersion: v1

kind: Service

metadata:

  name: hadoop-dn-service

  labels:

    app: hadoop-dn

spec:

  ports:

    - port: 9000

      name: hdfs

    - port: 50010

      name: data-node-trans

    - port: 50075

      name: data-node-http

  clusterIP: None

  selector:

    app: hadoop-dn

---

apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: hadoop-dn

spec:

  replicas: 3

  revisionHistoryLimit: 10

  selector:

    matchLabels:

      app: hadoop-dn

  serviceName: hadoop-dn-service

  template:

    metadata:

      labels:

        app: hadoop-dn

    spec:

      containers:

        - name: hadoop-dn

          image: 172.15.1.6:5000/hadoop:2.9.1

          command: ["bash", "-c", "/root/bootstrap/bootstrap.sh"]

          env:

            - name: HADOOP_CONF_DIR

              value: /etc/hadoop

            - name: NODE_TYPE

              value: DN,NM

          volumeMounts:

            - name: hadoop-config-volume

              mountPath: /etc/hadoop

            - name: hadoop-custom-config-volume

              mountPath: /root/config

            - name: dfs-name-dir-volume

              mountPath: /root/hadoop/dfs/name

            - name: dfs-data-dir-volume

              mountPath: /root/hadoop/dfs/data

            - name: hadoop-tmp-dir-volume

              mountPath: /root/hadoop/tmp

      volumes:

        - name: hadoop-config-volume

          persistentVolumeClaim:

            claimName: hadoop-config-pvc

        - name: hadoop-custom-config-volume

          configMap:

            name: hadoop-custom-config-cm

        - name: dfs-name-dir-volume

          emptyDir: {}

        - name: dfs-data-dir-volume

          emptyDir: {}

        - name: hadoop-tmp-dir-volume

          emptyDir: {}

 

9、创建对外网络映射服务

vi hadoop-service.yaml

 

apiVersion: v1

kind: Service

metadata:

  name: hadoop-ui-service

  labels:

    app: hadoop-nn

spec:

  ports:

    - port: 8088

      name: resource-manager

      nodePort: 30088

    - port: 50070

      name: name-node

      nodePort: 30070

  selector:

    app: hadoop-nn

  type: NodePort

 

10、测试是否成功

 

10.1.通过进程测试

进入对应的pod中使用jps命令查看进程是否正确(图一为NN RM,图二为DN NM)

 

 

10.2.通过web ui查看

 

 

参考链接:https://blog.csdn.net/chenleiking/article/details/82467715

Logo

开源、云原生的融合云平台

更多推荐