VMware 虚拟机安装 Hadoop/Hive 集群详细教程 (Macbook M1/M2) (第二部分)

安装 hadoop 参考上一篇:安装hadoop

目录

  • Install Hive on Hadoop,hadoop 基础上安装 hive
    • Install MySQL on namenode (hadoop1),hadoop1 节点安装 MySQL
    • Install hive, 安装 hive
    • Test HiveQL, 测试 hive,HiveQL

      Install Hive on Hadoop,hadoop 基础上安装 hive

      Install MySQL on namenode (hadoop1),hadoop1 节点安装 MySQL

      Hive 是一个基于 Hadoop 的数据仓库框架。

      Hive use MySQL to store meta data, mysql 作为 hive 元数据管理

      Hive support MySQL versions,hive 推荐支持的 mysql 版本:

      MySQL 5.7.x

      MySQL 8.0.x (Only for Hive3.1.x versions)

      We use MySQL 8.0.x,我们选择 8.0.x 版本

      1. Check already installed mysql, 删除之前的 mysql
      sudo yum list installed | grep mysql
      #Remove previous mysql,如有则删除
      sudo yum remove ${xxx}
      

      (仅参考)删除以前 mysql data directory “sudo rm -rf /var/lib/mysql”

      1. Install mysql-8.0.37, 安装 mysql-8.0.37
      • 参考官方文档 https://dev.mysql.com/doc/refman/8.0/en/linux-installation-yum-repo.html
      • 下载包 https://dev.mysql.com/get/mysql84-community-release-el7-1.noarch.rpm
      • 上传到/opt/software
        sudo yum install /opt/software/mysql84-community-release-el7-1.noarch.rpm
        yum repolist all | grep mysql
        # enable 8.0.37 instead of 8.4,开启8.0.37,关闭默认8.4
        sudo yum-config-manager --disable mysql-8.4-lts-community
        sudo yum-config-manager --disable mysql-tools-8.4-lts-community
        sudo yum-config-manager --enable mysql80-community
        sudo yum-config-manager --enable mysql-tools-community
        # start to install mysql,开始安装mysql到Hadoop1
        sudo yum install mysql-community-server
        
        1. Start MySQL as service, 后台启动 MySQL
        sudo systemctl start mysqld
        sudo systemctl status mysqld
        # 后台关闭mysql
        # sudo systemctl stop mysqld
        # 取消开机自启动
        # sudo systemctl disable mysqld
        
        1. Find the temporary password for mysql root user from log,找到 root 用户临时密码,登录后修改密码
        sudo grep 'temporary password' /var/log/mysqld.log
        [Server] A temporary password is generated for root@localhost: :vwyr%Eih0nx
        

        Change root user password, passwords contain at least one uppercase letter, one lowercase letter, one digit, and one special character, and that the total password length is at least 8 characters. 至少一个大写,一个小写,一个数字,一个符号,至少 8 位

        mysql -u root -p
        # change password, 修改密码
        ALTER USER 'root'@'localhost' IDENTIFIED BY '{new password}';
        eixt;
        
        1. Test login as root user with new password,测试新密码登录
        mysql -u root -p
        SHOW DATABASES;
        
        1. Create mysql user “hive”, 创建新 mysql 用户 hive,密码 Hive123456.
        CREATE USER 'hive'@'localhost' IDENTIFIED BY 'Hive123456.';
        # 将所有数据库的所有表的所有权限赋给hive
        GRANT ALL ON *.* TO 'hive'@'localhost';
        FLUSH PRIVILEGES;
        
        1. Check if hive can login remotely, if “Host” is “localhost” means can only login locally,允许 mysql hive 用户远程登录 mysql
        SELECT Host, User FROM mysql.user WHERE User = 'hive';
        UPDATE mysql.user SET Host = '%' WHERE User = 'hive';
        FLUSH PRIVILEGES;
        exit;
        

        Install hive, 安装 hive

        1. Download hive-3.1.3,下载 hive: https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-3.1.3/apache-hive-3.1.3-bin.tar.gz
        tar -zxvf /opt/software/apache-hive-3.1.3-bin.tar.gz -C /opt/modules/
        cd /opt/modules/
        mv apache-hive-3.1.3-bin hive-3.1.3
        
        1. Download mysql java connector, 下载 mysql java 连接器: https://downloads.mysql.com/archives/c-j/

        安装 mysql java connector,将 mysql-connector-java.jar 拷贝到/hive-3.1.3/lib 目录下

        sudo yum install /opt/software/mysql-connector-j-8.0.20-1.el7.noarch.rpm
        cp /usr/share/java/mysql-connector-java.jar /opt/modules/hive-3.1.3/lib
        
        1. Add hive to env path, 添加 hive 到环境变量
        sudo vim /etc/profile
        # 追加
        export HIVE_HOME=/opt/modules/hive-3.1.3
        export PATH=$PATH:$HIVE_HOME/bin
        
        source /etc/profile
        
        1. Enable hive config, create hive-site.xml, 开启 hive 默认配置,修改 hive 其他配置
        cd /opt/modules/hive-3.1.3/conf
        mv hive-default.xml.template hive-default.xml
        touch hive-site.xml
        vim hive-site.xml
        
        # hive-site.xml
           javax.jdo.option.ConnectionURL jdbc:mysql://localhost:3306/hive_metastore?createDatabaseIfNotExist=true    javax.jdo.option.ConnectionDriverName com.mysql.cj.jdbc.Driver    javax.jdo.option.ConnectionUserName hive    javax.jdo.option.ConnectionPassword Hive123456. 
        1. Initialize hive metadata DB, 初始化 hive 元数据库
        schematool -dbType mysql -initSchema
        mysql -u hive -p
        SHOW DATABASES;
        

        登录发现 hive_metastore 库被创建

        1. 开启 hadoop 后,启动 hive
        start-all.sh
        # wait for a while
        hive
        

        Test HiveQL, 测试 hive,HiveQL

        1. create hive table, 创建表 employee
        CREATE TABLE employees (
          id INT,
          name STRING,
          salary FLOAT
        );
        
        1. Insert dumy data, 插入数据,底层运行 mapreduce
        INSERT INTO employees VALUES
          (1, 'John Doe', 50000),
          (2, 'Jane Smith', 60000),
          (3, 'Mike Johnson', 55000);
        
        1. Select data, 查找数据
        SELECT * FROM employees;
        
        # exit, 退出hive
        quit;
        
        1. Check yarn web ui http://hadoop1:8088, 查看 yarn web ui,mapreduce 程序

        2. Check data file on HDFS

        hdfs dfs -ls /user/hive/warehouse
        hdfs dfs -ls /user/hive/warehouse/employees
        hdfs dfs -cat /user/hive/warehouse/employees/000000_3
        

        数据用文件(类似 csv)进行存储,所以需要元数据管理把文件转化为结构化的数据表

        1. Another option is start hiveserver2, 也可以启动 hiveserver2 服务,使用第三方软件(如 DBeaver)远程连接 hive

        2. Add hadoop configuration, 需要修改 hadoop 配置 proxyuser “hadoop”,允许任何 hosts 使用 hadoop 用户的身份

        # 关闭hadoop
        stop-all.sh
        
        # core-site.xml, 追加
         hadoop.proxyuser.hadoop.hosts * hadoop.proxyuser.hadoop.groups *
        # sync config, 同步hadoop配置
        rsync -avz /opt/modules/hadoop-3.4.0/etc/hadoop/ hadoop@hadoop2:/opt/modules/hadoop-3.4.0/etc/hadoop/
        rsync -avz /opt/modules/hadoop-3.4.0/etc/hadoop/ hadoop@hadoop3:/opt/modules/hadoop-3.4.0/etc/hadoop/
        # 启动hadoop
        star-all.sh
        
        1. Connect hiveserver2 with beeline (build-in), 用 beeline 客户端连接 hiveserver2
        # Start, 启动hiveserver2
        hiveserver2
        

        新开 terminal,使用自带客户端 beeline 连接,匿名用户

        beeline -u jdbc:hive2://hadoop1:10000/
        show tables;
        
        • Check web ui, 访问 Web UI: http://hadoop1:10002
        • 可以看到有一个匿名连接

          1. Connect hiveserver2 with DBeaver, 使用 DBeaver 客户端连接 hiveserver2,匿名用户
          • URL: jdbc:hive2://hadoop1:10000/
          • Port,端口: 10000
          • No username, password,匿名用户