-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSQ: Validate that strings and string arrays are not mixed. #15920
Merged
gianm
merged 10 commits into
apache:master
from
gianm:sql-ingest-validate-string-array-mismatch
Mar 13, 2024
Merged
Changes from all commits
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
c89e67b
MSQ: Validate that strings and string arrays are not mixed.
gianm 083f736
Fixes.
gianm 092cb75
Merge branch 'master' into sql-ingest-validate-string-array-mismatch
gianm 4e01ce4
Merge branch 'master' into sql-ingest-validate-string-array-mismatch
gianm edf2843
Tests and docs and error messages.
gianm 09be4b6
More docs.
gianm 9f8f827
Adjustments.
gianm ef7e081
Adjust message.
gianm a765412
Fix tests.
gianm 09d63d2
Fix test in DV mode.
gianm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -71,19 +71,61 @@ The following shows an example `dimensionsSpec` for native ingestion of the data | |
|
||
### SQL-based ingestion | ||
|
||
Arrays can also be inserted with [SQL-based ingestion](../multi-stage-query/index.md) when you include a query context parameter [`"arrayIngestMode":"array"`](../multi-stage-query/reference.md#context-parameters). | ||
#### `arrayIngestMode` | ||
|
||
Arrays can be inserted with [SQL-based ingestion](../multi-stage-query/index.md) when you include the query context | ||
parameter `arrayIngestMode: array`. | ||
|
||
When `arrayIngestMode` is `array`, SQL ARRAY types are stored using Druid array columns. This is recommended for new | ||
tables. | ||
|
||
When `arrayIngestMode` is `mvd`, SQL `VARCHAR ARRAY` are implicitly wrapped in [`ARRAY_TO_MV`](sql-functions.md#array_to_mv). | ||
This causes them to be stored as [multi-value strings](multi-value-dimensions.md), using the same `STRING` column type | ||
as regular scalar strings. SQL `BIGINT ARRAY` and `DOUBLE ARRAY` cannot be loaded under `arrayIngestMode: mvd`. This | ||
is the default behavior when `arrayIngestMode` is not provided in your query context, although the default behavior | ||
may change to `array` in a future release. | ||
|
||
When `arrayIngestMode` is `none`, Druid throws an exception when trying to store any type of arrays. This mode is most | ||
useful when set in the system default query context with `druid.query.default.context.arrayIngestMode = none`, in cases | ||
where the cluster administrator wants SQL query authors to explicitly provide one or the other in their query context. | ||
|
||
The following table summarizes the differences in SQL ARRAY handling between `arrayIngestMode: array` and | ||
`arrayIngestMode: mvd`. | ||
|
||
| SQL type | Stored type when `arrayIngestMode: array` | Stored type when `arrayIngestMode: mvd` (default) | | ||
|---|---|---| | ||
|`VARCHAR ARRAY`|`ARRAY<STRING>`|[multi-value `STRING`](multi-value-dimensions.md)| | ||
|`BIGINT ARRAY`|`ARRAY<LONG>`|not possible (validation error)| | ||
|`DOUBLE ARRAY`|`ARRAY<DOUBLE>`|not possible (validation error)| | ||
|
||
In either mode, you can explicitly wrap string arrays in `ARRAY_TO_MV` to cause them to be stored as | ||
[multi-value strings](multi-value-dimensions.md). | ||
|
||
When validating a SQL INSERT or REPLACE statement that contains arrays, Druid checks whether the statement would lead | ||
to mixing string arrays and multi-value strings in the same column. If this condition is detected, the statement fails | ||
validation unless the column is named under the `skipTypeVerification` context parameter. This parameter can be either | ||
a comma-separated list of column names, or a JSON array in string form. This validation is done to prevent accidentally | ||
mixing arrays and multi-value strings in the same column. | ||
|
||
#### Examples | ||
|
||
Set [`arrayIngestMode: array`](#arrayingestmode) in your query context to run the following examples. | ||
|
||
For example, to insert the data used in this document: | ||
```sql | ||
REPLACE INTO "array_example" OVERWRITE ALL | ||
WITH "ext" AS ( | ||
SELECT * | ||
FROM TABLE( | ||
EXTERN( | ||
'{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null, \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null, \"arrayLong\":[], \"arrayDouble\":null}"}', | ||
'{"type":"json"}', | ||
'[{"name":"timestamp", "type":"STRING"},{"name":"label", "type":"STRING"},{"name":"arrayString", "type":"ARRAY<STRING>"},{"name":"arrayLong", "type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]' | ||
'{"type":"json"}' | ||
) | ||
) EXTEND ( | ||
"timestamp" VARCHAR, | ||
"label" VARCHAR, | ||
"arrayString" VARCHAR ARRAY, | ||
"arrayLong" BIGINT ARRAY, | ||
"arrayDouble" DOUBLE ARRAY | ||
Comment on lines
+123
to
+128
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thanks for fixing these up, i wrote these docs before i fixed up |
||
) | ||
) | ||
SELECT | ||
|
@@ -96,8 +138,7 @@ FROM "ext" | |
PARTITIONED BY DAY | ||
``` | ||
|
||
### SQL-based ingestion with rollup | ||
These input arrays can also be grouped for rollup: | ||
Arrays can also be used as `GROUP BY` keys for rollup: | ||
|
||
```sql | ||
REPLACE INTO "array_example_rollup" OVERWRITE ALL | ||
|
@@ -106,9 +147,14 @@ WITH "ext" AS ( | |
FROM TABLE( | ||
EXTERN( | ||
'{"type":"inline","data":"{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row1\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, null,3], \"arrayDouble\":[1.1, 2.2, null]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row2\", \"arrayString\": [null, \"b\"], \"arrayLong\":null, \"arrayDouble\":[999, null, 5.5]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row3\", \"arrayString\": [], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[null, 2.2, 1.1]} \n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row4\", \"arrayString\": [\"a\", \"b\"], \"arrayLong\":[1, 2, 3], \"arrayDouble\":[]}\n{\"timestamp\": \"2023-01-01T00:00:00\", \"label\": \"row5\", \"arrayString\": null, \"arrayLong\":[], \"arrayDouble\":null}"}', | ||
'{"type":"json"}', | ||
'[{"name":"timestamp", "type":"STRING"},{"name":"label", "type":"STRING"},{"name":"arrayString", "type":"ARRAY<STRING>"},{"name":"arrayLong", "type":"ARRAY<LONG>"},{"name":"arrayDouble", "type":"ARRAY<DOUBLE>"}]' | ||
'{"type":"json"}' | ||
) | ||
) EXTEND ( | ||
"timestamp" VARCHAR, | ||
"label" VARCHAR, | ||
"arrayString" VARCHAR ARRAY, | ||
"arrayLong" BIGINT ARRAY, | ||
"arrayDouble" DOUBLE ARRAY | ||
) | ||
) | ||
SELECT | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: will change