Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorized lexicographical_partition_ranges (~80% faster) #4575

Merged
merged 6 commits into from
Aug 3, 2023

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Jul 27, 2023

Which issue does this PR close?

Closes #4614

Rationale for this change

lexicographical_partition_ranges(u8) 2^10
                        time:   [1.5420 µs 1.5426 µs 1.5432 µs]
                        change: [-81.810% -81.793% -81.778%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

lexicographical_partition_ranges(u8) 2^12
                        time:   [2.3049 µs 2.3062 µs 2.3076 µs]
                        change: [-85.514% -85.454% -85.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low mild
  2 (2.00%) high mild

lexicographical_partition_ranges(u8) 2^10 with nulls
                        time:   [2.3843 µs 2.3858 µs 2.3873 µs]
                        change: [-67.819% -67.794% -67.767%] (p = 0.00 < 0.05)
                        Performance has improved.

lexicographical_partition_ranges(u8) 2^12 with nulls
                        time:   [3.2080 µs 3.2098 µs 3.2117 µs]
                        change: [-77.353% -77.335% -77.319%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe

lexicographical_partition_ranges(f64) 2^10
                        time:   [3.2753 µs 3.2774 µs 3.2799 µs]
                        change: [-82.556% -82.542% -82.528%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

lexicographical_partition_ranges(low cardinality) 1024
                        time:   [388.79 ns 389.09 ns 389.40 ns]
                        change: [-22.807% -22.726% -22.644%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Jul 27, 2023
@tustvold tustvold changed the title Faster lexicographical_partition_ranges (~80% faster) Vectorized lexicographical_partition_ranges (~80% faster) Jul 27, 2023
partition_point(start + bound / 2, end.min(start + bound + 1), |idx| {
comparator.compare(idx, target) != Ordering::Greater
})
Ok(out.into_iter())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In opted to preserve the existing function signature for now, I can definitely see a future incarnation returning the computed bitmask somehow to allow for more optimal processing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a ticket (I can also update the docs in #4615)

Some(n) => {
let n1 = n.inner().slice(0, slice_len);
let n2 = n.inner().slice(1, slice_len);
&(&n1 ^ &n2) | &values_ne
Copy link
Contributor Author

@tustvold tustvold Jul 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is quite possibly some more clever way to bit transitions from a bitmask, however, this is already likely sufficiently fast as to be irrelevant

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Took me a while to follow the logic though, a comment could help. "values are either not-equal (and both non-null) or exactly one value is null"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite clever

@alamb
Copy link
Contributor

alamb commented Aug 1, 2023

I filed #4614 to track this. I am reviewing this PR as well. Thank you @tustvold -- looks very exciting

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code looks very nice and clever @tustvold well done 👏

cc @crepererum as I think you also used this approach while working on the deduplication logic in IOx.

The only thing I worry about with this change is the relative lack of test coverage. Specifically, I didnt' see a tests for the following cases which seem important:

  1. partitioning of arrays with 0 and 1 elements -- I think given the new slicing we should add a test to verify the correct behavior in these scenarios
  2. arrays where the values in change but the slots are marked null, so they shouldn't be new partitions (to ensure the null mask handling works properly)
  3. mulit-column partitioning where both arrays had null values and
  4. arrays with nulls that are greated than 2 in length (aka where there are more than 2 partitions)

partition_point(start + bound / 2, end.min(start + bound + 1), |idx| {
comparator.compare(idx, target) != Ordering::Greater
})
Ok(out.into_iter())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a ticket (I can also update the docs in #4615)

}

/// Returns the number of partitions
pub fn len(&self) -> usize {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will allow a very quick check for the case of all partitions have a length of 1, which may allow for a more efficient special case. In the case of IOx this will allow for it to avoid calling into the dedup logic at all

) -> Result<impl Iterator<Item = Range<usize>> + '_, ArrowError> {
LexicographicalPartitionIterator::try_new(columns)
}
pub fn partition(columns: &[ArrayRef]) -> Result<Partitions, ArrowError> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API no longer takes SortColumn as it doesn't actually matter what the sort order is, just that the data is sorted

previous_partition_point: usize,
partition_point: usize,
}
match num_rows {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb Your testing suggestions were on the money 👍

@tustvold
Copy link
Contributor Author

tustvold commented Aug 2, 2023

I have updated this PR with more tests and a cleaner API, PTAL

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really nice to me -- great work @tustvold

cc @wolfcm

Some(n) => {
let n1 = n.inner().slice(0, slice_len);
let n2 = n.inner().slice(1, slice_len);
&(&n1 ^ &n2) | &values_ne
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is quite clever

},
Arc::new(Int64Array::new(vec![1; 9].into(), None)) as _,
Arc::new(Int64Array::new(
vec![1, 1, 2, 2, 2, 3, 3, 3, 3].into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to also have a test like this where the value in the null would actually be wrong / different

Suggested change
vec![1, 1, 2, 2, 2, 3, 3, 3, 3].into(),
vec![1, 1, 2, 2, 2, 3, 0, 3, 3].into(),

However I also made that change and it passed so 👍

///
/// Consecutive ranges will be contiguous: i.e [`(a, b)` and `(b, c)`], and
/// `start = 0` and `end = self.len()` for the first and last range respectively
pub fn ranges(&self) -> Vec<Range<usize>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One potential difference with this implementation that I realized compared to main is that it requires memory (a Vec) to store the partition ranges where the previous implementation just iterated over them. I think it would be possible to implement an iterator for this as well to avoid that regression. I'll see if I can make a PR to do so

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I messed around with trying to make this work but got stymied by the borrow checker -- specifically that the BitIndexIterator had a reference so I couldn't embed it in another iterator.

@alamb alamb merged commit 841a6a9 into apache:master Aug 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve speed of lexicographical_partition_ranges
3 participants