Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Added Timestamp/Binary/Float to fuzz #13280

Merged
merged 5 commits into from
Nov 10, 2024

Conversation

jonathanc-n
Copy link
Contributor

Which issue does this PR close?

Closes #13279.

What changes are included in this PR?

Added timestamp, binary, and float for the fuzz testing

@github-actions github-actions bot added the core Core DataFusion crate label Nov 6, 2024
),
ColumnDescr::new("binary", DataType::Binary),
ColumnDescr::new("large_binary", DataType::LargeBinary),
ColumnDescr::new("binaryview", DataType::BinaryView),
Copy link
Contributor

@LeslieKid LeslieKid Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put binary near string types instead of placing it in the middle of some fixed-size primitive types.

use rand::Rng;

/// Randomly generate binary arrays
pub struct BinaryArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -171,6 +172,22 @@ fn baseline_config() -> DatasetGeneratorConfig {
ColumnDescr::new("time32_ms", DataType::Time32(TimeUnit::Millisecond)),
ColumnDescr::new("time64_us", DataType::Time64(TimeUnit::Microsecond)),
ColumnDescr::new("time64_ns", DataType::Time64(TimeUnit::Nanosecond)),
// TODO: randomize timezones for timestamp types
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets create a ticket instead of todo

Vec::new()
} else {
let len = rng.gen_range(1..=max_len);
(0..len).map(|_| rng.gen()).collect()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering if len differs from max_len?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that len is the actual length of the value, which is drawn between 1..max_len

pub rng: StdRng,
}

impl BinaryArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it, thinking of if we should tests for this generator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator itself is part of a test 🤔 What would we test? Maybe that the distinct values are as specified?

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jonathanc-n -- I think this looks great in my opinion

@@ -210,6 +226,9 @@ fn baseline_config() -> DatasetGeneratorConfig {
// low cardinality columns
ColumnDescr::new("u8_low", DataType::UInt8).with_max_num_distinct(10),
ColumnDescr::new("utf8_low", DataType::Utf8).with_max_num_distinct(10),
ColumnDescr::new("binary", DataType::Binary),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially remove the todo binary a few lines above

pub rng: StdRng,
}

impl BinaryArrayGenerator {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the generator itself is part of a test 🤔 What would we test? Maybe that the distinct values are as specified?

@alamb alamb merged commit 31d27c2 into apache:main Nov 10, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Nov 10, 2024

Thanks again @jonathanc-n -- and thanks to @comphead @LeslieKid for the reviews

jayzhan211 pushed a commit to jayzhan211/datafusion that referenced this pull request Nov 12, 2024
* Added Timestamp/Binary/Float to fuzz

* clippy fix

* small fix

* remove todo

* remove todo
@jonathanc-n jonathanc-n deleted the add-timestamp/binary/float-to-fuzz branch November 27, 2024 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add fuzz support for Timestamp, Binary and Float
4 participants