ClickHouse之Explain查看执行计划网站首页 学无止境

ClickHouse之Explain查看执行计划

Java技术债务 2024-06-13 12:01:02

简介ClickHouse之Explain查看执行计划

文章目录

前言
基本语法
EXPLAIN 类型
补充
忠告

前言

在 clickhouse 20.6 版本之前要查看 SQL 语句的执行计划需要设置日志级别为 trace 才能可以看到，并且只能真正执行 sql，在执行日志里面查看。在 20.6 版本引入了原生的执行计划的语法。在 20.6.3 版本成为正式版本的功能。

基本语法

EXPLAIN [AST | SYNTAX | QUERY TREE | PLAN | PIPELINE | ESTIMATE | TABLE OVERRIDE] [setting = value, ...]
    [
      SELECT ... |
      tableFunction(...) [COLUMNS (...)] [ORDER BY ...] [PARTITION BY ...] [PRIMARY KEY] [SAMPLE BY ...] [TTL ...]
    ]
    [FORMAT ...]

示例：

explain select * from td_xplatform.stats_sd_campaign where profileId = 113434409578602 and date between '20220101' and '20220101';
-- 结果
Expression ((Projection + Before ORDER BY))
  ReadFromMergeTree (td_xplatform_local.stats_sd_campaign)

注意：ClickHouse官网也并没有过多讲解执行计划具体详细的解释，通过对sql执行的步骤分析出ClickHouse的执行计划是从底部往上逐一执行。

EXPLAIN 类型

PLAN：用于查看执行计划，默认值。
AST：用于查看语法树;
SYNTAX：用于优化语法，有时我们指定的查询语句未必是最优的，那么 ClickHouse 在底层会进行优化，EXPLAIN SYNTAX 可以返回对一条 SQL 语句进行优化后的结果。通过对比优化前和优化后的 SQL 语句，可以有助于我们理解 ClickHouse 的优化机制
PIPELINE：用于查看 PIPELINE 计划。
ESTIMATE：显示处理查询时要从表中读取的估计行数、标记数和部分数

注意：
explain默认值PLAN的结果不是那么的通俗易懂，可以使用setting进行查看详细的执行计划，判断sql是否合理化。

EXPLAIN PLAN

用于查看执行计划，默认值。以下是setting的key：

header：打印计划中各个步骤的 head 说明，默认关闭，默认值 0;
description：打印计划中各个步骤的描述，默认开启，默认值 1;
actions：打印计划中各个步骤的详细信息，默认关闭，默认值0。
json：以JSON格式将查询计划步骤打印为一行。默认：0
indexes：显示使用的索引、过滤部分的数量以及应用的每个索引的过滤颗粒数。默认值：0。支持MergeTree表。
- **NameSkip ：**索引名称（目前仅用于索引）。
- **Keys ：**索引使用的列数组。
- Condition：使用情况。
- DescriptionSkip：索引描述（目前仅用于索引）。
- Parts：应用索引之前/之后的部分数。
- Granules：应用索引之前/之后的颗粒数。

示例如下：

explain actions=1
select id,sum(totalCost) Spend
from xplatform_sd.sd_campaign ca
join td_xplatform.stats_sd_campaign report on ca.amazon_campaign_id = report.campaignId
where profileId = 113434409578602
group by id
order by Spend desc ;

-- 执行计划
Expression (Projection)
Actions: INPUT :: 0 -> id Int32 : 0
         INPUT : 1 -> sum(totalCost) Nullable(Decimal(38, 2)) : 1
         ALIAS sum(totalCost) :: 1 -> Spend Nullable(Decimal(38, 2)) : 2
Positions: 0 2
  Sorting (Sorting for ORDER BY)
  Sort description: sum(totalCost) DESC
    Expression (Before ORDER BY)
    Actions: INPUT :: 0 -> id Int32 : 0
             INPUT :: 1 -> sum(totalCost) Nullable(Decimal(38, 2)) : 1
    Positions: 0 1
      Aggregating
      Keys: id
      Aggregates:
          sum(totalCost)
            Function: sum(Nullable(Decimal(9, 2))) → Nullable(Decimal(38, 2))
            Arguments: totalCost
            Argument positions: 1
        Expression (Before GROUP BY)
        Actions: INPUT :: 0 -> id Int32 : 0
                 INPUT :: 1 -> totalCost Nullable(Decimal(9, 2)) : 1
        Positions: 0 1
          Filter (WHERE)
          Filter column: equals(profileId, 113434409578602) (removed)
          Actions: INPUT :: 0 -> id Int32 : 0
                   INPUT : 1 -> profileId Int64 : 1
                   INPUT :: 2 -> totalCost Nullable(Decimal(9, 2)) : 2
                   COLUMN Const(UInt64) -> 113434409578602_8 UInt64 : 3
                   FUNCTION equals(profileId :: 1, 113434409578602_8 :: 3) -> equals(profileId, 113434409578602) UInt8 : 4
          Positions: 0 2 4
            Join (JOIN)
              Expression (Before JOIN)
              Actions: INPUT :: 0 -> id Int32 : 0
                       INPUT :: 1 -> amazon_campaign_id Nullable(Int64) : 1
                       INPUT : 2 -> profile_id Nullable(Int64) : 2
                       COLUMN Const(UInt64) -> 113434409578602 UInt64 : 3
                       FUNCTION equals(profile_id :: 2, 113434409578602 :: 3) -> equals(profile_id, 113434409578602) Nullable(UInt8) : 4
              Positions: 0 1 4
                SettingQuotaAndLimits (Set limits and quota after reading from storage)
                  ReadFromStorage (MySQL)
              Expression ((Joined actions + Rename joined columns))
              Actions: INPUT : 0 -> profileId Int64 : 0
                       INPUT : 1 -> campaignId Int64 : 1
                       INPUT : 2 -> totalCost Nullable(Decimal(9, 2)) : 2
                       ALIAS profileId :: 0 -> profileId Int64 : 3
                       ALIAS campaignId :: 1 -> campaignId Int64 : 0
                       ALIAS totalCost :: 2 -> totalCost Nullable(Decimal(9, 2)) : 1
              Positions: 0 3 1
                SettingQuotaAndLimits (Set limits and quota after reading from storage)
                  Expression ((Projection + Before ORDER BY))
                  Actions: INPUT :: 0 -> profileId Int64 : 0
                           INPUT :: 1 -> campaignId Int64 : 1
                           INPUT :: 2 -> totalCost Nullable(Decimal(9, 2)) : 2
                  Positions: 0 1 2
                    SettingQuotaAndLimits (Set limits and quota after reading from storage)
                      ReadFromMergeTree
                      ReadType: Default
                      Parts: 843
                      Granules: 936

简单解释一下特殊的名词：

SettingQuotaAndLimits： Set limits and quota after reading from storage（从存储读取后设置限制和配额）
Positions：当前执行计划的位置
ReadFromMergeTree：从MergeTree读取数据
ReadFromStorage：从存储读取数据
Filter (WHERE)：是 WHERE 或 HAVING 的实现。拦截过滤数据
Sorting (Sorting for ORDER BY)：排序
Expression：计算列上的函数，例如 x, y -> x + 1, y * 2。使用 explain actions = 1 select … 查看更多详细信息
Aggregating：负责 GROUP BY。

EXPLAIN AST

查看查询的抽象语法树（AST）。支持所有类型的查询，而不仅仅是 SELECT。

示例如下：

EXPLAIN AST ALTER TABLE t1 DELETE WHERE date = today();
--执行计划
AlterQuery  t1 (children 1)
   ExpressionList (children 1)
    AlterCommand 27 (children 1)
     Function equals (children 1)
      ExpressionList (children 2)
       Identifier date
       Function today (children 1)
        ExpressionList

EXPLAIN SYNTAX

在语法优化后返回查询。

可以使用此语法进行判断sql是否是当前查询方式的最佳性能，可以查看一些sql是会进行谓词下推

示例如下：

explain syntax
select target.expression,
       sum(totalCost) Spend
from (select targetId, totalCost,date from td_xplatform.stats_sd_target where  profileId = 113434409578602) report
left join (select amazon_target_id,expression from xplatform_sd.sd_product_ad_target where profile_id = 113434409578602) target on report.targetId = target.amazon_target_id
where date between '20220101' and '20220202'
group by target.expression
order by Spend;
-- 执行计划如下
SELECT
    expression,
    sum(totalCost) AS Spend
FROM
(
    SELECT
        targetId,
        totalCost,
        date
    FROM td_xplatform.stats_sd_target
    WHERE (profileId = 113434409578602) AND ((date <= '20220202') AND (date >= '20220101'))
) AS report
ALL LEFT JOIN
(
    SELECT
        amazon_target_id,
        expression
    FROM xplatform_sd.sd_product_ad_target
    WHERE profile_id = 113434409578602
) AS target ON targetId = amazon_target_id
WHERE (date >= '20220101') AND (date <= '20220202')
GROUP BY expression
ORDER BY Spend ASC

EXPLAIN PIPELINE

header：打印计划中各个步骤的 head 说明，默认关闭;
graph：用DOT图形语言描述管道图，默认关闭，需要查看相关的图形需要配合graphviz 查看;
compact：如果开启了 graph，紧凑打印行开关。1：开启（默认值），0：关闭；

示例如下：

explain pipeline 
select sum(totalCost) 
from td_xplatform.stats_sd_campaign 
where profileId = 113434409578602
group by campaignId;
-- 结果如下
(SettingQuotaAndLimits)
  (Expression)
  ExpressionTransform
    (Aggregating)
    Resize 4 → 1
      AggregatingTransform × 4
        StrictResize 4 → 4
          (Expression)
          ExpressionTransform × 4
            (SettingQuotaAndLimits)
              (ReadFromMergeTree)
              MergeTreeThread × 4 0 → 1

简单解释一下特殊的名词：

SettingQuotaAndLimits： Set limits and quota after reading from storage（从存储读取后设置限制和配额）
AggregatingTransform x 4：4次聚合转换，是 Group By 高性能的核心所在。原因是因为ClickHouse是一个很耗cpu的数据库，会在不同的线程上进行统计计算，有多少线程取决于ClickHouse服务器多少核，多少线程。
ReadFromMergeTree：从MergeTree读取数据到内存中

EXPLAIN ESTIMATE

显示处理查询时要从表中读取的估计行数、标记数和部分数。使用MergeTree系列中的表。

示例如下：

explain estimate select sum(totalCost) from td_xplatform.stats_sd_target where profileId = 113434409578602 and date between '20220101' and '20220202';