HiveSQL的基础知识点

一、各数据类型的基础知识点

1.1 数值类型

整数

小数

float

double(常用)

decimal(针对高精度)

1.2 日期类型

date

datetime

timestamp

time

year

1.3 字符串类型

char

varchar / varchar2

blob /text

tinyblob / tinytext

mediumblob / mediumtext

longblob / longtext

string(常用)

二、Hive中的常用函数

2.1 单行函数

2.1.1 数值转换函数

round

取整

（可指定精度）四舍五入

floor(向下取整函数)

ceil(向上取整函数)

abs(绝对值函数)

rand(取随机数函数)

cast：类型转换

2.1.2 日期函数

from_unixtime(时间戳转日期函数)

unix_timestamp(日期转时间戳函数)

current_date：当前日期

current_timestamp：当前日期加时间，精确到毫秒

to_date(日期时间转日期函数)

year(日期转年函数)

month(日期转月函数)

day(日期转天函数)

hour(日期转小时函数)

minute(日期转分钟函数)

second(日期转秒函数)

weekofyear(日期转周函数)

datediff(日期比较函数)

date_add(日期增加函数)

date_sub(日期减少函数)

months_between

2.1.3 流程控制函数

if：条件判断

coalesce() /nvl(非空函数)

case when：条件判断

2.1.4 字符串函数

length(字符串长度函数)

reverse(字符串反转函数)

concat(字符串拼接函数)

concat_ws(带分隔符的字符串连接函数)

substr / substring(字符串截取函数)

upper / ucase (字符串转大写函数)

lower / lcase (字符串转小写函数)

trim / ltrim / rtrim：取出空格/左空格/右空格

space：空格字符串函数

repeat：重复字符串函数

lpad / rpad：左/右补足函数

split(分割字符串)

replace：替换字符串

regexp_replace(正则表达式的替换函数)

regexp_extract(正则表达式的解析函数)

get_json_object(json解析函数)：从路径中提取json对象

2.1.5 集合函数

size：集合中元素的个数

map：创建map集合

map_keys：返回map中的key

map_values：返回map中的value

2.2 聚合函数

2.2.1 count(个数统计)

2.2.2 sum(总和统计函数)

2.2.3 avg(平均值统计函数)

2.2.4 min(最小值统计函数)

2.2.5 max(最大值统计函数)

2.2.6 高级聚合函数

collect_list : 收集并形成list集合，结果不去重

collect_set：收集并形成set集合，结果去重

2.3 炸裂函数

2.3.1 explode(炸裂函数)

2.3.2 posexplode(炸裂函数，带有下角标pos)

2.3.3 lateral view 侧写视图

2.4 窗口函数

2.4.1 聚合函数

2.4.2 前后函数

lag

lead

2.4.3 头尾函数

first_value

last_value

2.4.4 排名函数

一、各数据类型的基础知识点

1.1 数值类型

int 代表整数（hive常用bigint）: float代表小数，double代表双精度，double比float精度更高，小数位更多。

整数

int

有符号(signed）的和无符号(unsigned)的。有符号的取值区间为-2147483648~2147483647，无符号的取值区间为0~ 4294967295。宽度最多为11个数字 -int(11)
- tinyint
  
  有符号和无符号的。有符号的取值范围是-128~127, 无符号的取值范围是0~255。宽度最多为4个数字 -tinyint(4)
  - smallint
    
    有符号和无符号的。有符号的取值范围是-32767~32767 ，无符号的取值范围是0~65535。宽度最多为4个数字 -tinyint(4)
    - mediumint
      
      有符号和无符号的。有符号的取值范围是-8388608~8388607 ，无符号的取值范围是0~16777215。宽度最多为9个数字 -mediumint(9)
      - bigint(常用)
        
        有符号的和无符号的。宽度最多为20个数字-bigint(20)
        
        小数
        
        float
        
        代表小数，默认是(10,2)
        
        double(常用)
        
        代表双精度，默认是(16,4)
        
        decimal(针对高精度)
        
        比float精度更高，小数位更多，默认为(16,4)
        
        1.2 日期类型
        
        date
        
        代表 YYYY-MM-DD格式，例如：1989-12-31
        
        datetime
        
        代表 YYYYY-MM-DD HH:MM:SS 格式，例如：1989-12-31 15:30:00
        
        timestamp
        
        代表时间戳，例如：1989年12月31日下午15:30，在数据库中存储为：19891231153000
        
        time
        
        代表 HH:MM:SS格式
        
        year
        
        以2位或4位格式存储年份值
        
        补充
        
        1.3 字符串类型
        
        char
        
        固定长度字符串（可以是汉字或字母），长度是1-255。如果内容小于指定长度，右边填充空格，如果不指定长度，默认是1。
        
        varchar / varchar2
        
        可变字符串，长度取值是1-255。创建表使用该类型时必须指定长度。
        
        blob /text
        
          最大长度65535，用于存储二进制大数据，如图片，无法指定长度。两者区别：blob类型对于大小写比较敏感。
        
        tinyblob / tinytext
        
        最大长度是255，不能指定长度。
        
        mediumblob / mediumtext
        
        最大长度是16777215字符。
        
        longblob / longtext
        
        最大长度是429496295字符。
        
        string(常用)
        
        二、Hive中的常用函数
        
        hive中有大量内置函数，一般可以分为：单行函数、聚合函数、炸裂函数、窗口函数
        
        查看内置函数
        
        show functions;
        
        查看内置函数用法
        
        desc function upper;
        
        查看内置函数详细信息
        
        desc function extended upper;
        
        2.1 单行函数
        
        单行函数的特点是一进一出，输入一行，输出一行。
        
        2.1.1 数值转换函数
        
        round
        
        取整
        
        语法：round(double a）
        返回值：bigint
        说明：返回double类型的整数值部分（遵循四舍五入）
        举例：select round(4.1578 ) --> 4
        
        （可指定精度）四舍五入
        
        语法：round(double a, int b)
        返回值：double
        说明：返回指定精度的double类型
        举例：select round(4.1578 , 2) --> 4.16
        
        floor(向下取整函数)
        
        语法：floor(double a)
        返回值：bigint
        说明：返回等于或者小于该double变量的最大整数
        举例：select floor(34.12) --> 34
        
        ceil(向上取整函数)
        
        语法：ceil(double a)
        返回值：bigint
        说明：返回大于或者等于该double变量的最小的整数
        举例：select ceil(34.12) --> 35
        
        abs(绝对值函数)
        
        语法：abs(double a ) 、abs(int a)
        返回值：double 、int
        说明：返回数值a的绝对值
        举例：select abs( - 3.9) --> 3.9
        
        rand(取随机数函数)
        
        语法：rand()
        返回值：double
        说明：返回[0,1]范围内的随机数；
        举例：select rand()
        
        cast：类型转换
        
        select cast(1 as bigint)
        1
        
        2.1.2 日期函数
        
        from_unixtime(时间戳转日期函数)
        
        语法：from_unixtime(bigint unixtime , string format)
        返回值：输入：bigint的时间戳，输出string格式化的时间
        说明：转化 unix时间戳（从 1970-01-01 00:00:00 UTC 到指定时间的秒数）到当前时区的时间格式，默认的format是yyyy-MM-dd HH:mm:ss，可以指定别的。
        举例：select from_unixtime(1323308943,'yyyyMMdd') --> 20111208
        unix_timestamp(日期转时间戳函数)
        
        语法：unix_timestamp(string date) 、unix_timestamp(string date,string pattern)
        返回值：bigint
        说明：将格式为"yyyy-MM-dd HH:mm:ss"的日期转换成 unix的时间戳。如果转换失败，则返回值为0；
        注意：输入时间必须是到秒级的时间，否则转换失败返回NULL
        
        举例：
        
        select  unix_timestamp() ---> 当前日期的时间戳
        select unix_timestamp('2024-02-01 22:17:23')  --> 1706825843
        select unix_timestamp('20240201 20:17:11','yyyyMMdd HH:mm:ss') --> 1706825843
        
        current_date：当前日期
        
        select current_date();
        2024-02-04
        
        current_timestamp：当前日期加时间，精确到毫秒
        
        select current_timestamp();
        2024-02-04 16:06:24.504000000
        
        to_date(日期时间转日期函数)
        
        语法：to_date(string timestamp)
        返回值：string
        说明：返回日期时间字段中的日期部分
        举例：select to_date('2024-02-01 22:17:23') ---> 2024-02-01
        year(日期转年函数)
        
        语法：year(string date)
        返回值：int
        说明：返回日期中的年份
        举例：select year('2024-02-01 22:17:23') ---> 2024
        month(日期转月函数)
        
        语法：month(string date)
        返回值：int
        说明：返回日期中的月份
        举例：select month('2024-02-01 22:17:23') ---> 2
        day(日期转天函数)
        
        语法：day(string date)
        返回值：int
        说明：返回日期中的天
        举例：select day('2024-02-01 22:17:23' ) ---> 1
        hour(日期转小时函数)
        
        语法：hour(string date)
        返回值：int
        说明：返回日期中的小时
        举例：select hour('2024-02-01 22:17:23') --->22
        minute(日期转分钟函数)
        
        语法：minute(string date)
        返回值：int
        说明：返回日期中的分钟
        举例：select minute('2024-02-01 22:17:23') ---> 17
        second(日期转秒函数)
        
        语法：second(string date)
        返回值：int
        说明：返回日期中的秒
        举例：select ('2024-02-01 22:17:23') ---> 23
        weekofyear(日期转周函数)
        
        语法：weekofyear(string date)
        返回值：int
        说明：返回该日期在当年的周数
        举例：select weekofyear ('2024-02-01 22:17:23') --->5
        datediff(日期比较函数)
        
        语法：datediff(string enddate,string startdate)
        返回值：int
        说明：返回结束日期减去开始日期的天数
        举例：select datediff ('2024-02-01','2024-01-28') ---> 4
        date_add(日期增加函数)
        
        语法：date_add(string startdate , int days)
        返回值：string
        说明：返回开始日期startdate 加上days天后的日期
        举例：select date_add('2024-02-01',10) ---> 2024-02-11
        date_sub(日期减少函数)
        
        语法：date_sub(string startdate,int days)
        返回值：string
        说明：返回开始日期startdate 减去days天后的日期
        举例：select  date_sub('2024-02-01',3) --->2024-01-29
        months_between
        
        语法：months_between(string enddate,string startdate)
        返回值：double
        说明：返回结束日期减去开始日期的月份数
        举例： select months_between('2024-02-01','2024-01-01') --> 1 select  months_between('2024-02-01','2024-01-11') --> 0.67741935
        
        2.1.3 流程控制函数
        
        if：条件判断
        
        类似Java中的三目运算符
        
        语法：if(boolean testCondition, T valueTrue, T valueFalseOrNull)
        返回值：T
        说明：当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull
        举例：select if(1=2,100,200) --> 200
        
        coalesce() ：空字段赋值
        
        语法：coalesce(T v1,T v2 ...)
        返回值：T
        说明：返回参数中的第一个非空值，如果所有值都为null,那么直接返回null
        举例：select coalesce(null,'100','50') --> 100
        
        nvl：空字段赋值
        
        语法：nvl( string1, replace_with)
        返回值：T
        说明：如果string1是null值，则nvl函数返回replace_with的值，否则返回string1的值，如果两个参数都为null，则返回null。
        举例：select nvl(null,'空') --> '空'
        
        两者区别
        
        1.coalesce coalesce函数语法是：coalesce(表达式1，表达式2.....表达式n); coalesce函数的返回结果是第一个非空表达式，如果全是空，则返回空。 coalesce函数的处理参数个数没有限制，使用时需要注意：对处理参数的数据类型有严格要求，所有表达式值是同一类型（转换成同一类型也可）。 2.nvl nvl函数语法是：nvl(默认值,表达式); 如果默认值不为空返回默认值，如果默认值是空，则返回表达式，如果两者都为空则返回空。 nvl函数的处理参数个数有限，只能传两个参数。使用时需注意：“默认值”，“表达式”的值的数据类型没有要求，可相同也可不同。
        
        case when：条件判断
        
        语法：case column when vlaue1 then result1
        when vlaue1 then result2
        else esult3
        end as column1
        返回值：T
        说明：如果column等于 vlaue1，那么返回result1 ; 如果column等于 vlaue2，那么返回result2;否则返回result3
        举例：
        select case 100  when 500 then 'false'
        when 100 then 'true'
        else '?'  end ; ---> true
        
        2.1.4 字符串函数
        
        length(字符串长度函数)
        
        语法：length(string A)
        返回值：int
        说明：返回字符串的长度
        举例：select length('sghanan') ; ---> 7
        
        reverse(字符串反转函数)
        
        语法：reverse(string A)
        返回值：string
        说明：返回字符串A的反转结果
        举例：select  reverse('sghanxh') ; ---> hxnahgs
        
        concat(字符串拼接函数)
        
        语法：concat(string A ,string B.......)
        返回值：string
        说明：返回字符串拼接后的结果，支持输入任意个数的字符串
        举例：select  concat('ad','cv','op') ;---> adcvop
        
        concat_ws(带分隔符的字符串连接函数)
        
        语法：concat_ws(string SEP, string A ,string B.......)
        返回值：string
        说明：返回输入字符串连接后的结果,SEP表示各个字符串的分隔符
        举例：select  concat_ws('|','ad','cv','op') ;---> ad|cv|op
        
        substr / substring(字符串截取函数)
        
        语法：subtr( string A , int start) , substring( string A , int start )
        参数start 代表的是：截取起始的位置(start从1 开始计数)
        
        返回值：string
        说明：截取从start 位置到结尾的字符串
        举例：select substr( 'abcde' ,3) --> 从第三个字符串开始截取，得到 cde
        select substr( 'abcde' ,-1) --> 从倒数第一个字符串开始截取，得到 e
        
        upper / ucase (字符串转大写函数)
        
        语法：upper(string A) 、 ucase(string A)
        返回值：string
        说明：返回字符串A的大写格式
        举例：select upper('absSedF') ----> ABSSEDF
        
        lower / lcase (字符串转小写函数)
        
        语法：lower(string A)
        返回值：string
        说明：去除字符串两边的空格
        举例：select lower('absSedF') ----> abssedf
        
        trim / ltrim / rtrim：取出空格/左空格/右空格
        
        语法：trim(string A)
        返回值：string
        说明：返回字符串A的小写格式
        举例：select trim(' absSedF ') ----> absSedF
        select length('  absSedF ') ---> 10
        
        space：空格字符串函数
        
        语法: space(int n)
        说明：返回长度为5的空格字符串
        举例：select length(space(5)) ---> 5
        
        repeat：重复字符串函数
        
        语法: repeat(string A, int n)
        返回值：string
        说明：将字符串A重复n遍。
        举例：select repeat('123', 3); ---> 123123123
        
        lpad / rpad：左/右补足函数
        
        语法: lpad(string str, int len, string pad) 、rpad(string str, int len, string pad)
        举例：select lpad('abc',7,'td'); ---> tdtdabc
        select rpad('abc',7,'td'); ---> abctdtd
        
        split(分割字符串)
        
        语法：split(string str, string pat)
        返回值：array
        说明：按照pat分隔符分割字符串str, 返回分割后的字符串数组
        举例：select split('adgncf','n') --> ["adg","cf"]
        
        replace：替换字符串
        
        语法：replace(string A, string B, string C)
        返回值：string
        说明：将字符串A中的子字符串B替换为C。
        举例：select replace('atguigu', 'a', 'A') ---> Atguigu
        
        ^开头、$结尾
        [0-9]任意一个数字
        
        . 代表任意单个字符
        
        + 代表一个或多个匹配，* 代表0或多个匹配
        | 代表或
        
        regexp_replace(正则表达式的替换函数)
        
        语法：regexp_replace(string A, string B, string C)
        返回值：string
        说明：将字符串A中符合java正则表达式的B替换成C；注意有些情况要使用转义字符；
        举例：
        select regexp_replace('foobasr' ,  'oo|ar','')  ----> fbasr
        select regexp_replace('foobasr' ,  'oo|ba','')   --->fsr
        ps:正则表达式中的 | 竖线代表的是：'或'的意思，其中有一个为true，那整体就是true
        正则表达式网站：
        正则表达式可视化-Visual Regexp：在线测试、学习、构建正则表达式正则表达式图形化可视化工具、自动生成正则、常用正则速查、常用语法速查、测试正则语法、测试内容匹配。收录常用正则有URL、邮箱、手机号、身份证等。https://wangwl.net/static/projects/visualRegex#prefix=Z&source=ZidNum
        
        regexp_extract(正则表达式的提取函数)
        
        语法：regexp_extract(string subject, string pattern, int index)
        返回值：string
        说明：将字符串subject 按照pattern正则表达式的规则拆分，返回index指定的字符
        举例：
        select regexp_extract('foothebar','foo(.*?)(bar)' ,0)  ---->  foo
        select regexp_extract('foothebar','foo(.*?)(bar)' ,1)----> the
        select regexp_extract('foothebar','foo(.*?)(bar)' ,2) ---> bar
        上述代码，会将字符串 foothebar 拆分成 foor, the,bar; 对应的index 索引分别是：0,1,2
        
        get_json_object(json解析函数)：从路径中提取json对象
        
        语法：get_json_object(string json_string, string path)
        返回值：string
        说明：解析json 的字符串json_string, 返回path指定的内容，如果输入的json字符串无效，那么返回null
        举例：select get_json_object('{"name":"zs","age":18,"address":"安德门"}','$.name') as name ---> zs
        select get_json_object('[{"name":"大海海","sex":"男","age":"25"},{"name":"小宋宋","sex":"男","age":"47"}]','$.[0].name') ----> 大海海
        
        2.1.5 集合函数
        
        size：集合中元素的个数
        
        # 返回每一行数据中friends集合中的个数
        select size() from test;
        
        map：创建map集合
        
        语法：map (key1, value1, key2, value2, …)
        返回值：map集合
        说明：根据输入的key和value来构建map类型
        举例：select map('xiaohai',1,'dahai',2); --->  {"xiaohai":1,"dahai":2}
        
        map_keys：返回map中的key
        
        语法：map_keys(map (key1, value1, key2, value2,.....))
        返回值：集合
        说明：返回map中的key
        举例：select map_keys(map('xiaohai',1,'dahai',2)); --->["xiaohai","dahai"]
        
        map_values：返回map中的value
        
        语法：map_values(map (key1, value1, key2, value2,.....))
        返回值：array
        说明：返回map中的value
        举例：select map_keys(map('xiaohai',1,'dahai',2)); --->[1,2]
        
        array
        
        语法：array(val1, val2, …)
        返回值：array集合
        说明：根据输入的参数构建数组array类
        举例：select array('1','2','3','4'); --->["1","2","3","4"]
        
        array_contains: 判断array中是否包含某个元素
        
        语法：array_contains(array(val1, val2, …),element)
        返回值：true 或false
        说明：判断array中是否包含某个元素
        举例：select array('1','2','3','4'); --->["1","2","3","4"]
        
        array_contains的具体应用见文章：
        
        HiveSQL题——array_contains函数-CSDN博客文章浏览阅读1.1k次，点赞27次，收藏12次。HiveSQL题——array_contains函数https://blog.csdn.net/SHWAITME/article/details/135960849?spm=1001.2014.3001.5502
        
        sort_array：将array中的元素排序(无法指定排序规则,默认是升序排序)
        
        语法：sort_array(array(val1, val2, …))
        返回值：array
        说明：对array中的元素排序，默认是升序排序
        举例：select sort_array(array('a','d','c')); --->  ["a","c","d"]
        
        2.2 聚合函数
        
        聚合函数是UDAF函数（多进一出）
        
        2.2.1 count(个数统计)
        
        语法：count(*)，count(col)计数
        返回值：int
        说明：count(*)会统计包括null值在内的行;
        count(expr)返回指定字段的个数（会过滤null值）;
        count(distinct expr)返回指定字段的去重后个数(会过滤null值)
        举例：select count(user_id) from table;
        
        2.2.2 sum(总和统计函数)
        
        语法：sum(col)
        返回值：double
        说明：sum(col) 统计某个结果集中col列相加的结果;
        sum(distinct col )某个结果集中col列去重后相加的结果
        举例：select sum(*) from table;
        
        2.2.3 avg(平均值统计函数)
        
        语法：avg(col) ,avg(distinct col)
        返回值：double
        说明：avg(col)统计某个结果集中col列的平均值;
        avg(distinct col)统计某个结果集中col列去重后的平均值
        举例：select avg(score) from table；
        
        2.2.4 min(最小值统计函数)
        
        语法：min(col)
        返回值：double
        说明：统计某个结果集中col列的最小值
        举例：select min(score) from table
        
        2.2.5 max(最大值统计函数)
        
        语法：max(col)
        返回值：double
        说明：统计某个结果集中col列的最大值
        举例：select max(score) from table
        
        总结：
        1.count(*)操作时会统计null值，count(column)会过滤掉null值;
        2.事实上除了count(*)计算，剩余的聚合函数例如： max(column),min(column),avg(column),count(column) 函数会过滤掉null值
        
        2.2.6 高级聚合函数
        
        collect_list : 收集并形成list集合，结果不去重
        
        语法：collect_list(col)
        返回值：array
        说明：在hive中是把一个key的多个信息收集起来合成一个，不去重
        举例：select avg(score) from table；
        
        -- 举例： with tmp as ( select 'a' as test union all select 'c' as test union all select 's' as test union all select 'd' as test ) select collect_list(test) from tmp; 结果：["c","s","d","a"] ，可以看出：聚合后的数组元素无法保证顺序性
        
        collect_set：收集并形成set集合，结果去重
        
        语法：collect_set(col)
        返回值：array
        说明：在hive中是把一个key的多个信息收集起来，去重
        举例：select avg(score) from table；
        
        -- 举例： with tmp as ( select 'a' as test union all select 'c' as test union all select 'a' as test union all select 'd' as test ) select collect_set(test) from tmp; 结果：["a","c","d"]
        
        ps: 聚合函数的具体使用场景见文章：
        
        HiveSQL题——聚合函数(sum/count/max/min/avg)-CSDN博客文章浏览阅读739次，点赞18次，收藏17次。HiveSQL题——聚合函数(sum/count/max/min/avg)https://blog.csdn.net/SHWAITME/article/details/135918264?spm=1001.2014.3001.5501
        
        ps: collect_set() /collect_list()有序性见文章：
        
        HiveSQL题——collect_set()/collect_list()聚合函数-CSDN博客文章浏览阅读380次，点赞6次，收藏9次。HiveSQL题——collect_set()/collect_list()聚合函数https://blog.csdn.net/SHWAITME/article/details/136011647?spm=1001.2014.3001.5502
        
        2.3 炸裂函数
        
        UDTF（Table-Generating Function）定义：接收一行数据，输出一行或多行数据。
        
        炸裂函数的详细知识点见文章：
        
        HiveSQL题——炸裂函数(explode/posexplode)_hive exolode-CSDN博客文章浏览阅读1.1k次，点赞28次，收藏13次。HiveSQL题——炸裂函数(explode/posexplode)_hive exolodehttps://blog.csdn.net/SHWAITME/article/details/135941286?spm=1001.2014.3001.5502
        
        2.3.1 explode(炸裂函数)
        
        (1)explode(array a) --> explode针对数组进行炸裂语法：lateral view explode(split(a,',')) tmp as new_column 返回值：string 说明:按照分隔符切割字符串，并将数组中内容炸裂成多行字符串举例：select student_score from test lateral view explode(split(student_score,',')) tmp as item; 输出结果为： student_score item [a,b,c] => a b c (2)explode(map m) --> explode针对map键值对进行炸裂举例：select explode(map('a',1,'b',2,'c',3)) as (key,value); 输出结果为：得到 key value {a:1,b:2,c:3} => a 1 b 2 c 3
        
        2.3.2 posexplode(炸裂函数，带有下角标pos)
        
        posexplode和explode之间的区别：posexplode除了返回数据，还会返回该值的下角标。 (1)posexplode(array a) 语法：lateral view posexploed(split(a,',')) tmp as pos,item 返回值：string 说明:按照分隔符切割字符串，并将数组中内容炸裂成多行字符串(炸裂具备下角标 0,1,2,3) 举例1：select posexplode (array('a','b','c')) as pos,item; 输出结果为： pos item [a,b,c] => 0 a 1 b 2 c --------------------------------- 举例2：对student_name进行炸裂，同时也对student_score进行炸裂，且需要保证炸裂后，学生和成绩一一对应，不能错乱。 select student_name, student_score from test lateral_view posexplode(split(student_name,',')) sn as student_name_index, student_name laretal view posexplode(split(student_score,',')) sc as student_score_index,student_score where student_name_index = student_score_index ;
        
        2.3.3 lateral view 侧写视图
        
        定义：lateral view 通常与UDTF配合使用，lateral view 可以将UDTF应用到源表的每行数据，将每行数据转换成一行或者多行，并将源表中每行的输出结果与该行连接起来，形成一个虚拟表举例：select id, name, hobbies, hobby from person lateral view explode(hobbies) tmp as hobby; 分析：对源表person中的hobbies列进行炸裂（一行变多行），新字段命名hobby，利用侧视图lateral view 将源表person的每行与hobby连接起来，形成一个虚拟表，命名为tmp。
        
        ps: 炸裂函数的具体使用场景见文章：
        HiveSQL题——数据炸裂和数据合并-CSDN博客文章浏览阅读711次，点赞17次，收藏11次。HiveSQL题——数据炸裂和数据合并https://blog.csdn.net/SHWAITME/article/details/135952216?spm=1001.2014.3001.5501
        
        2.4 窗口函数
        
        2.4.1 聚合函数
        
        聚合函数分类: count()、sum()、min()、max()、avg()
        语法：聚合函数() over( partition by column order by column rows / range between..................)
        
        ps: 聚合函数的具体使用场景见文章：
        
        HiveSQL题——聚合函数(sum/count/max/min/avg)-CSDN博客文章浏览阅读739次，点赞18次，收藏17次。HiveSQL题——聚合函数(sum/count/max/min/avg)https://blog.csdn.net/SHWAITME/article/details/135918264?spm=1001.2014.3001.5501
        
        2.4.2 前后函数
        
        lag
        
        -- 取得column列前边的第n行数据，如果存在则返回，如果不存在，返回默认值default lag(column,n,default) over(partition by ....order by....)
        
        lead
        
        -- 取得column列后边的第n行数据，如果存在则返回，如果不存在，返回默认值default lead(column,n,default) over(partition by... order by...)，
        
        lag和lead函数不支持自定义窗口。
        
        ps: 前后函数的具体使用场景见文章：
        
        HiveSQL题——前后函数(lag/lead)-CSDN博客文章浏览阅读1.2k次，点赞23次，收藏21次。HiveSQL题——前后函数(lag/lead)https://blog.csdn.net/SHWAITME/article/details/135902998?spm=1001.2014.3001.5502
        
        2.4.3 头尾函数
        
        first_value
        
        ---当前窗口column列的第一个数值，如果有null值，则跳过 first_value(column,true) over (partition by ..order by.. 窗口子句) ---当前窗口column列的第一个数值，如果有null值，不跳过 first_value(column,false) over (partition by ..order by.. 窗口子句)
        
        last_value
        
        --- 当前窗口column列的最后一个数值，如果有null值，则跳过 last_value(column,true) over (partition by ..order by.. 窗口子句) --- 当前窗口column列的最后一个数值，如果有null值,不跳过 last_value(column,false) over (partition by ..order by.. 窗口子句)
        
        头尾函数支持自定义窗口子句（rows between unbounded preceding and unbounded following ）
        
        2.4.4 排名函数
        
        row_number() over(partition by .. order by .. ) 顺序排序（行号）——1、2、3 rank() over(partition by .. order by .. ) 并列排序，跳过重复序号——1、1、3 dense_rank() over(partition by .. order by .. ) 并列排序，不跳过重复序号——1、1、2
        
        排名函数不支持自定义窗口子句
        
        ps：排序函数的具体使用场景见文章：
        
        HiveSQL题——排序函数(row_number/rank/dense_rank)-CSDN博客文章浏览阅读934次，点赞20次，收藏15次。HiveSQL题——排序函数(row_number/rank/dense_rank)https://blog.csdn.net/SHWAITME/article/details/135909662?spm=1001.2014.3001.5501

分类:热门推荐日期:2024-03-25浏览:1评论:0

一、各数据类型的基础知识点

1.1 数值类型

整数

小数

float

double(常用)

decimal(针对高精度)

1.2 日期类型

date

datetime

timestamp

time

year

1.3 字符串类型

char

varchar / varchar2

blob /text

tinyblob / tinytext

mediumblob / mediumtext

longblob / longtext

string(常用)

二、Hive中的常用函数

2.1 单行函数

2.1.1 数值转换函数

round

取整

（可指定精度）四舍五入

floor(向下取整函数)

ceil(向上取整函数)

abs(绝对值函数)

rand(取随机数函数)

cast：类型转换

2.1.2 日期函数

from_unixtime(时间戳转日期函数)

unix_timestamp(日期转时间戳函数)

current_date：当前日期

current_timestamp：当前日期加时间，精确到毫秒

to_date(日期时间转日期函数)

year(日期转年函数)

month(日期转月函数)

day(日期转天函数)

hour(日期转小时函数)

minute(日期转分钟函数)

second(日期转秒函数)

weekofyear(日期转周函数)

datediff(日期比较函数)

date_add(日期增加函数)

date_sub(日期减少函数)

months_between

2.1.3 流程控制函数

if：条件判断

coalesce() ：空字段赋值

nvl：空字段赋值

case when：条件判断

2.1.4 字符串函数

length(字符串长度函数)

reverse(字符串反转函数)

concat(字符串拼接函数)

concat_ws(带分隔符的字符串连接函数)

substr / substring(字符串截取函数)

upper / ucase (字符串转大写函数)

lower / lcase (字符串转小写函数)

trim / ltrim / rtrim：取出空格/左空格/右空格

space：空格字符串函数

repeat：重复字符串函数

lpad / rpad：左/右补足函数

split(分割字符串)

replace：替换字符串

regexp_replace(正则表达式的替换函数)

regexp_extract(正则表达式的提取函数)

get_json_object(json解析函数)：从路径中提取json对象

2.1.5 集合函数

size：集合中元素的个数

map：创建map集合

map_keys：返回map中的key

map_values：返回map中的value

2.2 聚合函数

2.2.1 count(个数统计)