Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-32381: [C++] Improve error handling for hash table merges #44969

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

helloitsheqing
Copy link

Original Issue: #32381

Description

This pull request introduces a comprehensive test suite for the hash aggregation functionality in Apache Arrow’s compute kernels. The new test suite covers the following key components:

GroupedCountImpl:

  • Validates merging of grouped counts with valid inputs.
  • Tests behavior with invalid group ID mappings (e.g., mismatched sizes, out-of-bounds indices).

GroupedMinMaxImpl:

  • Verifies correctness of merging minimum and maximum values across groups.
  • Includes edge cases for invalid and out-of-bounds group ID mappings.

GroupedCountAllImpl:

  • Tests merging of aggregated counts for "count all" scenarios.
  • Handles cases with valid and invalid inputs, ensuring robust error handling.

The test suite is structured following the conventions of aggregate_test.cc, providing comprehensive coverage of valid and edge-case scenarios.

Key Changes

New Test File: Added hash_aggregate_test.cc in cpp/src/arrow/compute/kernels/.
CMake Configuration: Updated cpp/src/arrow/compute/kernels/CMakeLists.txt to include the new test file.
Test Coverage:
Each test focuses on verifying correctness, error handling, and robustness for the Merge methods of:

  • GroupedCountImpl
  • GroupedMinMaxImpl
  • GroupedCountAllImpl

Future Work

If this test suite is approved, I plan to extend similar coverage to the remaining hash aggregation classes, including:

  • GroupedTDigestImpl
  • GroupedFirstLastImpl
  • GroupedBooleanAggregator
  • Other GroupedAggregator subclasses.
    This phased approach allows focused and iterative improvements to the test coverage for Arrow compute kernels.

@github-actions github-actions bot added the awaiting review Awaiting review label Dec 9, 2024
@helloitsheqing helloitsheqing marked this pull request as draft December 9, 2024 03:54
@helloitsheqing helloitsheqing marked this pull request as ready for review December 9, 2024 03:56
@helloitsheqing helloitsheqing reopened this Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review Awaiting review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant