Support Runtime Filter in TiDB #40220

elsa0520 · 2022-12-28T09:17:57Z

Goal

Improve the efficiency of Join execution in OLAP scenarios.

Background

PhysicalHashJoin completes the Join by constructing a HashTable from the data in the right table, and the data in the left table keeps probing the HashTable. If the Join Key value in part of the Probe process cannot Hit HashTable, then this part of the data in the description does not exist in the right table and will not appear in the last Join result.

In other words, this part of the data from PhysicalTableScan to send to PhysicalHashJoin is meaningless. If you can filter out this part of the Join Key data in advance, it will reduce scanning time and network overhead, thereby greatly improving Join efficiency .

Runtime Filter

Runtime Filter is a predicate that is generated during query planning The predicate that takes values dynamically. This predicate has the same function as other predicates in Selection. It is applied to filter rows in TableScan that do not meet the predicate condition.

The only difference is that the parameter values of the Runtime Filter predicate are constructed in HashJoin.

The following is an example:
T1 is a fact table with 100w data rows and T2 is a dimension table with 100 data rows.
select * from T1, T2 where T1.id=T2.id
The RF execution method is to scan the data of T2 first, PhysicalHashJoin calculates a filter based on the data of T2, such as the maximum and minimum values of the T2 data. Then send this filter to TableFullScan waiting to scan T1. T1 then applies this filter and gives the filtered data to PhysicalHashJoin, thereby reducing the rows of probe hash tables and network overhead.

                         2. build RF values
            +-------->+-------------------+
            |         |PhysicalHashJoin   |<-----+
            |    +----+                   |      |
4.after RF  |    |    +-------------------+      | 1. scan T2
    5000    |    |3. send RF                     |      100
            |    | filter data                   |
            |    |                               |
      +-----+----v------+                +-------+--------+
      |  TableFullScan  |                | TabelFullScan  |
      |      T1         |                |      T2        |
      +-----------------+                +----------------+

In summary, Runtime Filters are generated in the planner, built in PhysicalHashJoin, and applied in TableScan.

Type of predicate

IN, it is suitable for the case where the Join Key column has a small number of values. The predicate is of the form T1.id in (1, 5, 7, 8, 10)
MIN MAX, suitable for columns where the Join Key column is a range value. T1.id > 1 and T1.id < 10
Bloom Filter, suitable for situations where there are many values in the Join Key column.

TiFlash or TiKV

There are two storage engine in TiDB: TiFlash and TiKV. To achieve the best efficiency, RF necessary to combine the "predicate push down" and "delayed materialization" of the storage engine. However, TiKV does not have the function of delayed materialization because it is line storage engine, so RF applied to TiKV is not ideal.
Only TiFlash supports RF.

Precondition

Delayed materialization (in design), @Lloyd-Pottiger https://github.com/pingcap/tidb/pull/39654/files
Bloom Filter Index and Bloom Filter Expression @windtalker
Asynchronous read @windtalker

Stage goal

The first phase supports Broadcast local, IN the situation.
The second phase supports Shuffle Global's RF, as well as Bloom Filter, Min Max, BF/IN.

Design

Runtime Filter needs to be planned in the Planner and executed in Compute Engine. So the design is mainly divided into two parts:

Planner: Based on the matching pattern, generate RF and assign to the correct target node. ( TiDB )
Compute Engine: HashJoinNode builds RF and applies to target node ( TiFlash )
This article does not cover the design of Global's RF.

Planner design

The planner part of the RF is mainly used to identify HashJoin nodes through the structure of Tree, and plan the RF that meet the Patterns to the corresponding target node. According to the above logic, the Planner can be split into two core processes:

Generate Filter: Responsible for matching HashJoinNode that meets the Patterns, and combining EqPredicates on HashJoinNode to construct RF.
Assign Filter: Find the target node according to the constructed RF.

Process
Preorder traversal PhysicalPlanTree. Generate Filter when encountering PhysicalHashJoin node, Assign Filter when encountering TableFullScan.

Generate Filter

Patterns are following:

PhysicalHashJoin : exclude Left outer，full outer，anti join
Predicate

eq predicate
The left child is only column expr, or cast (column expr).
Right child types: numeric type, bool, character type. (Other json, blob, binary, etc. are not supported)
Evaluate selectivity

Assign Filter

The relationship between the Hash Join node and the Target Node node is shown in the figure below, and in real cases, one or more nodes may be interspersed between the two.

If TableScan contains the same Column as RF (the same unique id), then this TableScan will be assigned the current RF.

Principle: RF column can be pushed down as long as it has not undergone any other transformation, and the untransformed RF column is the same as the column unique id in TargetNode.

Compute Engine design

The main work of the Runtime Filter at the execution layer is as follows:

Compile: Translate the pb structure into executable RF, and associate the Build and Probe sides through Ref.
Generator Values: During the execution of HashBuild, the values of the required columns in each Block are continuously collected and RF expressions are constructed
Apply RF: When RF is ready, push RF down to the storage engine as a normal expression.

Compile

As shown in the figure, if a RF is encountered in the Compile stage, the corresponding vector < RuntimeFilter > is constructed RF and the reference is stored in HashBuild and Unordered respectively. The two are held together, one is responsible for constructing and the other is responsible for using.

Generator Values

Generator Values continuously obtain new Block data through the read method of HashBuild and obtain the data in the Block according to the column name required by the RF and increment to update the RuntimeFilter.

RF cost control
If IN values are too many, it may cause the RF to run too inefficiently. So we need to add a runtime_filter_max_in_values variable here to control that if the maximum value is exceeded, RF is discarded.

Apply RF

The role of the Apply RF is mainly to RF application to the Scan thread, and its location is more important in the whole process. The following are two prerequisites:

After RF ready: Occurs after RF construction. Since RF must wait for the Probe to complete the collection of Join Values before it can be applied by the Target Node. Therefore, the TargetNode's read must occur after RF Read.
- is_ready: HashJoin and TargetNode jointly hold is_ready variable. TargetNode will not enable scan out of the box until is_ready variable is set to true.
Before scan thread: Occurs before the scan thread starts. If the scan of the storage layer has already started, even after the RF is built, it cannot be pushed down to the storage layer for filtering. It cannot reduce the IO effect.
- runtime_filter_wait_time_ms: Maximum waiting time for the runtime filter. If this waiting time is exceeded, no RF is performed.

Added session variables

runtime_filter_type：in，min_max, bf，in/bf。
runtime_filter_mode：local，global
runtime_filter_wait_time_ms: the most time spent waiting for the target node, more than this time does not wait for RF. (for TiFlash for)
runtime_filter_in_max_values: The number of values used to control IN this RF, if it exceeds RF will not be run. (for TiFlash for)

Observability

Explain: RF id, RF producer, PF consumer, reference column. Used to check the correctness of the planner layer RF.

MySQL [ml_test]> explain select * from  t1, t2 where t1.k2=t2.k2;
+------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------+
| id                           | estRows  | task      | access object | operator info                                                             |
+------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------+
| HashJoin_8                   | 12487.50 | root      |               | inner join, equal:[eq(ml_test.t1.k2, ml_test.t2.k2)], rf_1: producer t2.k2|
| ├─TableReader_15(Build)      | 9990.00  | root      |               | data:Selection_14                                                        |
| │ └─Selection_14             | 9990.00  | cop[tikv] |               | not(isnull(ml_test.t2.k2))                                              |
| │   └─TableFullScan_13       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                          |
| └─TableReader_12(Probe)      | 9990.00  | root      |               | data:Selection_11                                                        |
|   └─Selection_11             | 9990.00  | cop[tikv] |               | not(isnull(ml_test.t1.k2))                                               |
|     └─TableFullScan_10       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo , rf_1: consumer t1.k2                    |
+------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------+

Profile display for RF performance tuning and execution data presentation.
- RF waiting time
- RF construction time
- Changes in data volume before and after RF
- etc

The text was updated successfully, but these errors were encountered:

elsa0520 · 2022-12-28T10:42:01Z

@fixdb Could you please assign to me and add label ?

fixdb · 2022-12-29T02:43:54Z

/assign @elsa0520

…l RF (#42899) close #40220

elsa0520 added the type/feature-request Categorizes issue or PR as related to a new feature. label Dec 28, 2022

ti-chi-bot assigned elsa0520 Dec 29, 2022

ti-chi-bot bot closed this as completed in #42899 May 31, 2023

ti-chi-bot bot pushed a commit that referenced this issue May 31, 2023

planner, runtimefilter: Runtime filter generator, Support IN and Loca…

91185a1

…l RF (#42899) close #40220

yibin87 mentioned this issue Aug 3, 2023

Expand the functionality of Local Runtime Filter pingcap/tiflash#7891

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Runtime Filter in TiDB #40220

Support Runtime Filter in TiDB #40220

elsa0520 commented Dec 28, 2022 •

edited

Loading

elsa0520 commented Dec 28, 2022 •

edited

Loading

fixdb commented Dec 29, 2022

Support Runtime Filter in TiDB #40220

Support Runtime Filter in TiDB #40220

Comments

elsa0520 commented Dec 28, 2022 • edited Loading

Goal

Background

Runtime Filter

Type of predicate

TiFlash or TiKV

Precondition

Stage goal

Design

Planner design

Generate Filter

Assign Filter

Compute Engine design

Compile

Generator Values

Apply RF

Added session variables

Observability

elsa0520 commented Dec 28, 2022 • edited Loading

fixdb commented Dec 29, 2022

elsa0520 commented Dec 28, 2022 •

edited

Loading

elsa0520 commented Dec 28, 2022 •

edited

Loading