Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FixedWidthRows for the arrow-row #4524

Closed
wants to merge 2 commits into from

Conversation

yahoNanJing
Copy link

@yahoNanJing yahoNanJing commented Jul 14, 2023

Which issue does this PR close?

Closes #4523.

Rationale for this change

There are mainly two reasons to introduce the FixedWidthRows:

  • For group value of fixed length, we don't need to store the offsets vector to get the row value, which makes the operation of getting the row value much more efficient. It's also proved by the benchmark results, especially for tpch q17.
  • By introducing the row_width to the RowConverter, it will be much easier for the Datafusion side to decide whether to make the group value embedded or not. The benchmark results also prove that when the group values are of small enough fixed length, to embed the value to the RawTable will bring great benefits.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 14, 2023
@yahoNanJing yahoNanJing marked this pull request as draft July 14, 2023 12:17
@yahoNanJing yahoNanJing force-pushed the issue-4523 branch 2 times, most recently from fe13b27 to 68085f7 Compare July 14, 2023 13:29
@yahoNanJing
Copy link
Author

yahoNanJing commented Jul 14, 2023

With the dependency of this PR, the tpch 1G benchmark for the datafusion, https://github.com/yahoNanJing/arrow-datafusion/tree/EBAY-KYLIN-4003-5, is as follows:

  • single partition
Screen Shot 2023-07-14 at 21 08 04
  • the default partitions
Screen Shot 2023-07-14 at 22 44 54

@yahoNanJing yahoNanJing marked this pull request as ready for review July 14, 2023 14:46
@yahoNanJing yahoNanJing requested review from alamb and tustvold July 14, 2023 14:46
@yahoNanJing
Copy link
Author

Next step, I will do some more optimization at the datafusion side to push the fixed row value to the RawTable value if the row width is less than a threshold.

@tustvold
Copy link
Contributor

I think I would like to hold off on merging this for a bit, I want to do some other experiments first before committing to this approach

@alamb
Copy link
Contributor

alamb commented Jul 14, 2023

@yahoNanJing can you please explain the rationale for the introduction of a fixed width row to a variable length one?

@yahoNanJing
Copy link
Author

yahoNanJing commented Jul 17, 2023

After embedding the value to the RawTable, for tpch q17, there's around 30% performance gain compared to the main branch.

  • single partition
    Screen Shot 2023-07-17 at 13 42 41

  • default partition
    Screen Shot 2023-07-17 at 13 47 08

@alamb
Copy link
Contributor

alamb commented Jul 17, 2023

More context for anyone else following along here can be found in apache/datafusion#6969

I would be very interested in trying to specialize the row_hash grouping operator t avoid the Row format entirely (and use a native Vec<T> for primitive types) and see how that compares with the fixed width row

@yahoNanJing
Copy link
Author

I would be very interested in trying to specialize the row_hash grouping operator t avoid the Row format entirely (and use a native Vec for primitive types) and see how that compares with the fixed width row

Hi @alamb, there are three reasons that I use the fixed width row:

  • It covers the null value
  • It may contain multiple columns not just for one column
  • For single column case, to embed a variable length column value, like String, to the RawTable may not be good.

@alamb
Copy link
Contributor

alamb commented Jul 17, 2023

For single column case, to embed a variable length column value, like String, to the RawTable may not be good.

Just to be clear, what I was imagining for the group storage is not to change the contents of the RawTable (it will continue to contain group_indexes).

But instead of storing group_values using the arrow Row format

 stores "group       stores group values, 
    indexes"          in arrow_row format 
                                          
 ┌─────────────┐      ┌────────────┐      
 │   ┌─────┐   │      │ ┌────────┐ │      
 │   │  5  │   │ ┌────┼▶│  "A"   │ │      
 │   ├─────┤   │ │    │ ├────────┤ │      
 │   │  9  │   │ │    │ │  "Z"   │ │      
 │   └─────┘   │ │    │ └────────┘ │      
 │     ...     │ │    │            │      
 │   ┌─────┐   │ │    │    ...     │      
 │   │  1  │───┼─┘    │            │      
 │   ├─────┤   │      │            │      
 │   │ 13  │───┼─┐    │ ┌────────┐ │      
 │   └─────┘   │ └────┼▶│  "Q"   │ │      
 └─────────────┘      │ └────────┘ │      
                      │            │      
                      └────────────┘      
                                          
                                          
                                          
       map            group_values        
  (Hash Table)                            
                                                                                                
                                                                               

We would instead store the group values using a native type like Vec<T> like this

 stores "group               stored in a      
    indexes"                native Vec<T>     
                                              
 ┌─────────────┐            ┌──────────┐      
 │   ┌─────┐   │            │ ┌──────┐ │      
 │   │  5  │   │    ┌───────┼▶│  1   │ │      
 │   ├─────┤   │    │       │ ├──────┤ │      
 │   │  9  │   │    │       │ │  3   │ │      
 │   └─────┘   │    │       │ └──────┘ │      
 │     ...     │    │       │          │      
 │   ┌─────┐   │    │       │    ...   │      
 │   │  1  │───┼────┘       │          │      
 │   ├─────┤   │            │          │      
 │   │ 13  │───┼────┐       │ ┌──────┐ │      
 │   └─────┘   │    └───────┼▶│  5   │ │      
 └─────────────┘            │ └──────┘ │      
                            │          │      
                            └──────────┘      
                                              
                                              
                            group_values      
       map                                    
  (Hash Table)                                
                                              

I agree the null value would need some special handling, but since this would only be for single columns (where there can be at most one null value) I think we could figure out some way to handle it

@yahoNanJing
Copy link
Author

yahoNanJing commented Jul 17, 2023

At my side, the bottleneck is the get_mut() of the RawTable. And the biggest improvement of this PR brings is to embed the value to the RawTable, which speed up the get_mut() very much.

I don't know whether it will bring much benefit by just changing the Row to the Vec and I'm looking forward to your benchmark results 😄

@alamb
Copy link
Contributor

alamb commented Jul 18, 2023

At my side, the bottleneck is the get_mut() of the RawTable. And the biggest improvement of this PR brings is to embed the value to the RawTable, which speed up the get_mut() very much.

Ah, that is a good insight -- the change yahoNanJing/arrow-datafusion@a18ac07 embeds the group value directly into the table 🤔

Another experiment we could try would be to use unsafe accesses to the group_values table / rows 🤔

Update: I am running this experiment with apache/datafusion#7010

I will also try and compare it to @yahoNanJing 's approach

@alamb
Copy link
Contributor

alamb commented Jul 18, 2023

Here are my benchmark results: apache/datafusion#6969 (comment) (TLDR they are mixed)

@yahoNanJing
Copy link
Author

Hi @alamb and @tustvold, how's your testing going on? And what do you think of this proposal?

@tustvold
Copy link
Contributor

I hope to have a PR up tomorrow that will special case single columns, and will be able to get some numbers for you. I am very keen to avoid this complexity if we can get the same or better performance in a simpler way

@yahoNanJing
Copy link
Author

@tustvold, the simpler is the better. I agree to adopt a simpler way if we can achieve the same or similar performance gain.

@alamb
Copy link
Contributor

alamb commented Jul 20, 2023

As @tustvold mentions, using native types for single columns should be significantly faster than any row format (thought we need to prove that). I do wonder if fixed width formats would be useful for multi-column equality comparisons (again we would need to show this via benchmarks, etc)

I wonder if one of the hesitations with the "FixedWidthRows" describes how this is implemented rather than its major usecase. Maybe we could name it something like "EqualityRows" or something given it focuses just on equality rather than ordering

@tustvold
Copy link
Contributor

I do wonder if fixed width formats would be useful for multi-column equality comparisons

Do we have any benchmarks that are grouping by multiple primitive columns? Whilst I appreciate that optimising for benchmarks is a form of observability bias, it can be a useful way to focus our efforts where they will have the most impact?

using native types for single columns should be significantly faster than any row format

My initial experiments have

--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ specialize-primitive-group-values ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 385.21ms │                          387.08ms │     no change │
│ QQuery 2     │ 102.04ms │                           84.20ms │ +1.21x faster │
│ QQuery 3     │ 104.50ms │                          103.45ms │     no change │
│ QQuery 4     │  69.37ms │                           71.72ms │     no change │
│ QQuery 5     │ 234.38ms │                          236.49ms │     no change │
│ QQuery 6     │  28.62ms │                           28.05ms │     no change │
│ QQuery 7     │ 550.32ms │                          583.28ms │  1.06x slower │
│ QQuery 8     │ 160.42ms │                          152.87ms │     no change │
│ QQuery 9     │ 348.10ms │                          345.23ms │     no change │
│ QQuery 10    │ 201.04ms │                          204.01ms │     no change │
│ QQuery 11    │  95.55ms │                           94.25ms │     no change │
│ QQuery 12    │ 112.20ms │                          110.78ms │     no change │
│ QQuery 13    │ 199.70ms │                          156.56ms │ +1.28x faster │
│ QQuery 14    │  30.56ms │                           32.08ms │     no change │
│ QQuery 15    │  34.63ms │                           31.02ms │ +1.12x faster │
│ QQuery 16    │ 104.27ms │                          105.36ms │     no change │
│ QQuery 17    │ 582.36ms │                          483.59ms │ +1.20x faster │
│ QQuery 18    │ 994.82ms │                          906.17ms │ +1.10x faster │
│ QQuery 19    │ 112.26ms │                          109.24ms │     no change │
│ QQuery 20    │ 204.93ms │                          218.78ms │  1.07x slower │
│ QQuery 21    │ 695.60ms │                          687.18ms │     no change │
│ QQuery 22    │  55.44ms │                           55.00ms │     no change │
└──────────────┴──────────┴───────────────────────────────────┴───────────────┘

there's around 30% performance gain compared to the main branch

I have not been able to reproduce these results using EBAY-KYLIN-4003-5

Comparing main and EBAY-KYLIN-4003-5
--------------------
Benchmark tpch_mem.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃     main ┃ EBAY-KYLIN-4003-5 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 385.21ms │          379.92ms │     no change │
│ QQuery 2     │ 102.04ms │          103.16ms │     no change │
│ QQuery 3     │ 104.50ms │          110.41ms │  1.06x slower │
│ QQuery 4     │  69.37ms │           74.71ms │  1.08x slower │
│ QQuery 5     │ 234.38ms │          249.58ms │  1.06x slower │
│ QQuery 6     │  28.62ms │           28.92ms │     no change │
│ QQuery 7     │ 550.32ms │          589.97ms │  1.07x slower │
│ QQuery 8     │ 160.42ms │          157.83ms │     no change │
│ QQuery 9     │ 348.10ms │          364.37ms │     no change │
│ QQuery 10    │ 201.04ms │          204.18ms │     no change │
│ QQuery 11    │  95.55ms │          103.85ms │  1.09x slower │
│ QQuery 12    │ 112.20ms │          113.52ms │     no change │
│ QQuery 13    │ 199.70ms │          178.66ms │ +1.12x faster │
│ QQuery 14    │  30.56ms │           31.45ms │     no change │
│ QQuery 15    │  34.63ms │           34.25ms │     no change │
│ QQuery 16    │ 104.27ms │          106.39ms │     no change │
│ QQuery 17    │ 582.36ms │          520.88ms │ +1.12x faster │
│ QQuery 18    │ 994.82ms │         1030.34ms │     no change │
│ QQuery 19    │ 112.26ms │          113.70ms │     no change │
│ QQuery 20    │ 204.93ms │          193.52ms │ +1.06x faster │
│ QQuery 21    │ 695.60ms │          670.74ms │     no change │
│ QQuery 22    │  55.44ms │           55.78ms │     no change │
└──────────────┴──────────┴───────────────────┴───────────────┘

@tustvold
Copy link
Contributor

Marking as a draft whilst we work out how best to proceed here

@tustvold tustvold marked this pull request as draft July 30, 2023 20:58
@tustvold
Copy link
Contributor

Closing this as I believe the consensus reached was not to proceed with this, feel free to reopen if I am mistaken

@tustvold tustvold closed this Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add FixedWidthRows to the arrow-row
4 participants