SQL窗口函数

11 Apr 2020

SQL面向的数据结构: 二维表

窗口: 数据行的上下文

语法

func(col) over ([partition by col...] [order by ...] [rows between ...])   

例子

mysql> select *,
    -> row_number() over w,
    -> rank() over w,
    -> dense_rank() over w,
    -> round(cume_dist() over w, 2),
    -> round(percent_rank() over w, 2)
    -> from vs
    -> window w as (order by v)
    -> ;
+----+------+---------------------+---------------+---------------------+------------------------------+---------------------------------+
| ts | v    | row_number() over w | rank() over w | dense_rank() over w | round(cume_dist() over w, 2) | round(percent_rank() over w, 2) |
+----+------+---------------------+---------------+---------------------+------------------------------+---------------------------------+
|  7 |    1 |                   1 |             1 |                   1 |                         0.10 |                            0.00 |
|  4 |    2 |                   2 |             2 |                   2 |                         0.30 |                            0.11 |
|  6 |    2 |                   3 |             2 |                   2 |                         0.30 |                            0.11 |
|  2 |    4 |                   4 |             4 |                   3 |                         0.40 |                            0.33 |
|  5 |    5 |                   5 |             5 |                   4 |                         0.60 |                            0.44 |
| 10 |    5 |                   6 |             5 |                   4 |                         0.60 |                            0.44 |
|  8 |    8 |                   7 |             7 |                   5 |                         0.80 |                            0.67 |
|  9 |    8 |                   8 |             7 |                   5 |                         0.80 |                            0.67 |
|  3 |    9 |                   9 |             9 |                   6 |                         1.00 |                            0.89 |
| 11 |    9 |                  10 |             9 |                   6 |                         1.00 |                            0.89 |
+----+------+---------------------+---------------+---------------------+------------------------------+---------------------------------+
10 rows in set (0.00 sec)


mysql> select *,
    -> sum(v) over (),
    -> sum(v) over w,
    -> round(sum(v) over w / sum(v) over (), 2) as ratio,
    -> max(v) over (order by ts rows between unbounded preceding and 1 preceding) as history_max,
    -> round(avg(v) over (order by ts rows between 3 preceding and current row), 2) as prev_3d_avg,
    -> greatest(lead(v, 1) over w, lead(v, 2) over w, lead(v, 3) over w) as next_3d_max,
    -> v-lag(v) over w as v_diff
    -> from vs
    -> window w as (order by ts)
    -> order by ts
    -> ;
+----+------+----------------+---------------+-------+-------------+-------------+-------------+--------+
| ts | v    | sum(v) over () | sum(v) over w | ratio | history_max | prev_3d_avg | next_3d_max | v_diff |
+----+------+----------------+---------------+-------+-------------+-------------+-------------+--------+
|  2 |    4 |             53 |             4 |  0.08 |        NULL |        4.00 |           9 |   NULL |
|  3 |    9 |             53 |            13 |  0.25 |           4 |        6.50 |           5 |      5 |
|  4 |    2 |             53 |            15 |  0.28 |           9 |        5.00 |           5 |     -7 |
|  5 |    5 |             53 |            20 |  0.38 |           9 |        5.00 |           8 |      3 |
|  6 |    2 |             53 |            22 |  0.42 |           9 |        4.50 |           8 |     -3 |
|  7 |    1 |             53 |            23 |  0.43 |           9 |        2.50 |           8 |     -1 |
|  8 |    8 |             53 |            31 |  0.58 |           9 |        4.00 |           9 |      7 |
|  9 |    8 |             53 |            39 |  0.74 |           9 |        4.75 |        NULL |      0 |
| 10 |    5 |             53 |            44 |  0.83 |           9 |        5.50 |        NULL |     -3 |
| 11 |    9 |             53 |            53 |  1.00 |           9 |        7.50 |        NULL |      4 |
+----+------+----------------+---------------+-------+-------------+-------------+-------------+--------+
10 rows in set (0.00 sec)

SQL执行计划阶段

  1. where
  2. group by
  3. having
  4. window
  5. order by
  6. limit offset

因此:

另外窗口函数不能嵌套调用.

业务场景

array_* 实现类似功能套路

Clickhouse暂不支持相关功能

缺点:

(不)相关SQL特性

grouping sets / cube / rollup

一次性多维度统计汇总.

窗口函数直接计算到每一行:

sum(v) over (partition by d1, d2),
sum(v) over (partition by d1),
sum(v) over (partition by d2),

with (Common Table Expressions)

个人理解: 子查询临时表别名, 面向过程SQL编程, 避免过深的嵌套子查询的阅读困难

Clickhouse中全局常量的with使用场合, 约等于全窗口计算.

with (select sum(bytes) from system.parts where active) as s
select table, sum(bytes) / s from system.parts group by table

select *, s / sum(s) over () from (
select table, sum(bytes) as s from parts group by table
) 

全窗口计算值应该不会每一行重复计算吧.

数据库支持情况

References

HOME