Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for contains function in datafusion substrait consumer #10861

Closed
richtia opened this issue Jun 10, 2024 · 9 comments · Fixed by #10879
Closed

Support for contains function in datafusion substrait consumer #10861

richtia opened this issue Jun 10, 2024 · 9 comments · Fixed by #10879
Assignees
Labels
enhancement New feature or request substrait

Comments

@richtia
Copy link

richtia commented Jun 10, 2024

Is your feature request related to a problem or challenge?

Datafusion does not have a contains function and will return an exception if a substrait plan contains it

Exception: DataFusion error: NotImplemented("Unsupported function name: \"contains:str_str\"")

Describe the solution you'd like

When the datafusion consumer sees the contains function, it can convert it to a combination of strpos + >

strpos(lit("abcd"), lit("de") > 0

Describe alternatives you've considered

No response

Additional context

No response

@richtia richtia added the enhancement New feature or request label Jun 10, 2024
@Lordworms
Copy link
Contributor

take

@Lordworms
Copy link
Contributor

Do we really need this conversion? what about adding a new ScalarFunction 'contains'? @alamb I would like to receive your advice here.

@alamb
Copy link
Contributor

alamb commented Jun 11, 2024

I am not sure @Lordworms -- maybe @waynexia or @Blizzara have more context / understanding

@richtia does this request come from a particular query you are runniing?

@richtia
Copy link
Author

richtia commented Jun 11, 2024

My preference would actually be to add the function as well. The request came from trying to run a query with the contains function.

@Blizzara
Copy link
Contributor

+1 to adding the function into DataFusion (+ the relevant mapping into Substrait) if it doesn't exist yet.

Seems like Arrow has a contains function but I cannot find it in DataFusion so maybe it doesn't indeed exist yet, dunno if the Arrow version can somehow be easily used?

@Lordworms
Copy link
Contributor

+1 to adding the function into DataFusion (+ the relevant mapping into Substrait) if it doesn't exist yet.

Seems like Arrow has a contains function but I cannot find it in DataFusion so maybe it doesn't indeed exist yet, dunno if the Arrow version can somehow be easily used?

Sure, I'll file a PR later today

@richtia
Copy link
Author

richtia commented Jun 11, 2024

+1 to adding the function into DataFusion (+ the relevant mapping into Substrait) if it doesn't exist yet.

Seems like Arrow has a contains function but I cannot find it in DataFusion so maybe it doesn't indeed exist yet, dunno if the Arrow version can somehow be easily used?

I didn't see the function in DataFusion either. I also couldn't find it in Arrow, at least not under any compute function.

@Lordworms
Copy link
Contributor

+1 to adding the function into DataFusion (+ the relevant mapping into Substrait) if it doesn't exist yet.
Seems like Arrow has a contains function but I cannot find it in DataFusion so maybe it doesn't indeed exist yet, dunno if the Arrow version can somehow be easily used?

I didn't see the function in DataFusion either. I also couldn't find it in Arrow, at least not under any compute function.

image I think DuckDB got a similar function, let me know if it is ok for me to implement similar behavior like this

@richtia
Copy link
Author

richtia commented Jun 11, 2024

Yeah, duckdb has a contains function as well as the substrait support. The same substrait plans that gave me the datafusion errors were able to execute against the DuckDB substrait consumer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request substrait
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants