You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Apache Druid 29.0.0 contains over 350 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 67 contributors.
Review the upgrade notes before you upgrade to Druid 29.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
Druid 29.0.0 adds experimental support for export statements to the MSQ task engine. This allows query tasks to write data to an external destination through the EXTERN function.
Druid 29.0.0 adds experimental support for the SQL PIVOT and UNPIVOT operators.
The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:
PIVOT (aggregation_function(column_to_aggregate)
FOR column_with_values_to_pivot
IN (pivoted_column1 [, pivoted_column2 ...])
)
The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:
UNPIVOT (values_column
FOR names_column
IN (unpivoted_column1 [, unpivoted_column2 ... ])
)
# Range support in window functions (experimental)
Window functions (experimental) now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter windowingStrictValidation to false.
The following example shows a window expression with RANGE frame specifications:
(ORDER BY c)
(ORDER BY c RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
(ORDER BY c RANGE BETWEEN CURRENT ROW AND UNBOUNDED PRECEDING)
Druid now supports arbitrary join conditions for INNER join. Any sub-conditions that can't be evaluated as part of the join are converted to a post-join filter. Improved join capabilities allow Druid to more effectively support applications like Tableau.
# Improved concurrent append and replace (experimental)
You no longer have to manually determine the task lock type for concurrent append and replace (experimental) with the taskLockType task context. Instead, Druid can now determine it automatically for you. You can use the context parameter "useConcurrentLocks": true for individual tasks and datasources or enable concurrent append and replace at a cluster level using druid.indexer.task.default.context.
# First and last aggregators for double, float, and long data types
Druid now supports first and last aggregators for the double, float, and long types in native and MSQ ingestion spec and MSQ queries. Previously, they were only supported for native queries. For more information, see First and last aggregators.
Added support for logging audit events and improved coverage of audited REST API endpoints.
To enable logging audit events, set config druid.audit.manager.type to log in both the Coordinator and Overlord or in common.runtime.properties. When you set druid.audit.manager.type to sql, audit events are persisted to metadata store.
The MSQ task engine now allows empty ingest queries by default. Previously, ingest queries that produced no data would fail with the InsertCannotBeEmpty MSQ fault.
For more information, see Empty ingest queries in the upgrade notes.
The MSQ task engine now supports Google Cloud Storage (GCS). You can use durable storage with GCS. See Durable storage configurations for more information.
A new DDSketch extension is available as a community contribution. The DDSketch extension (druid-ddsketch) provides support for approximate quantile queries using the DDSketch library.
A new histogram extension is available as a community contribution. The Spectator-based histogram extension (druid-spectator-histogram) provides approximate histogram aggregators and percentile post-aggregators based on Spectator fixed-bucket histograms.
A new Delta Lake extension is available as a community contribution. The Delta Lake extension (druid-deltalake-extensions) lets you use the Delta Lake input source to ingest data stored in a Delta Lake table into Apache Druid.
Added support for array types for all the ingestion wizards.
When loading multi-value dimensions or arrays using Druid's Query console, note the value of the arrayIngestMode parameter. Druid now configures the arrayIngestMode parameter in the data loading flow, and its value can persist across the SQL tab, even if you execute unrelated Data Manipulation Language (DML) operations within the same tab.
When the requested granularity is a month or larger but a segment can't be allocated, Druid resorts to day partitioning.
Unless explicitly specified, Druid skips week-granularity segments for data partitioning because these segments don't align with the end of the month or more coarse-grained intervals.
# Changed how empty or null array columns are stored
Columns ingested with the auto column indexer that contain only empty or null containing arrays are now stored as ARRAY<LONG> instead of COMPLEX<json>.
Resolved an issue where the auto-kill feature failed to honor the specified buffer period. This occurred when multiple unused segments within an interval were marked as unused at different times.
You can submit kill tasks with an optional parameter maxUsedStatusLastUpdatedTime. When set to a date time, the kill task considers segments in the specified interval marked as unused no later than this time. The default behavior is to kill all unused segments in the interval regardless of the time when segments where marked as unused.
# Improved tombstone generation to honor granularity specified in a REPLACE query
MSQ REPLACE queries now generate tombstone segments honoring the segment granularity specified in the query rather than generating irregular tombstones. If a query generates more than 5000 tombstones, Druid returns an MSQ TooManyBucketsFault error, similar to the behavior with data segments.
Added JSON_QUERY_ARRAY which is similar to JSON_QUERY except the return type is always ARRAY<COMPLEX<json>> instead of COMPLEX<json>. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations.
Improved the ANY_VALUE(expr) function to support the boolean option aggregateMultipleValues. The aggregateMultipleValues option is enabled by default. When you run ANY_VALUE on an MVD, the function returns the stringified array. If aggregateMultipleValues is set to false, ANY_VALUE returns the first value instead.
Added support for using expressions to compute the JSON path argument for JSON_VALUE and JSON_QUERY functions dynamically. The JSON path argument doesn't have to be a constant anymore.
Enhanced filtering performance for lookups as follows:
Added sqlReverseLookupThreshold SQL query context parameter. sqlReverseLookupThreshold represents the maximum size of an IN filter that will be created as part of lookup reversal #15832
Improved loading and dropping of containers for lookups to reduce inconsistencies during updates #14806
Changed behavior for initialization of lookups to load the first lookup as is, regardless of cache status #15598
# Enabled query request queuing by default when total laning is turned on
When query scheduler threads are less than server HTTP threads, total laning turns on.
This reserves some HTTP threads for non-query requests such as health checks.
The total laning previously would reject any query request that exceeds the lane capacity.
Now, excess requests will instead be queued with a timeout equal to MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout).
Added a supplier that can return NullValueIndex to be used by NullFilter. This improvement should speed up is null and is not null filters on JSON columns #15687
Added an option to compare results with relative error tolerance #15429
Added capability for the Broker to access datasource schemas defined in the catalog when processing SQL queries #15469
Added CONCAT flattening and filter decomposition #15634
Enabled ARRAY_TO_MV to support expression inputs #15528
Improved ExpressionPostAggregator to handle ARRAY types output by the grouping engine #15543
Improved the error message you get when there's an error in the specified interval #15454
Improved error reporting for math functions #14987
Improved handling of COALESCE, SEARCH, and filter optimization #15609
Increased memory available for subqueries when the query scheduler is configured to limit queries below the number of server threads #15295
Optimized SQL planner for filter expressions by introducing column indexes for expression virtual columns #15585
Optimized queries involving large NOT IN operations #15625
Fixed an issue with nested empty array fields #15532
Fixed NPE with virtual expression with unnest #15513
Fixed an issue with AND and OR operators and numeric nvl not clearing out stale null vectors for vector expression processing #15587
Fixed an issue with filtering columns when using partial paths such as in JSON_QUERY#15643
Fixed queries that raise an exception when sketches are stored in cache #15654
Fixed queries involving JSON functions that failed when using negative indexes #15650
Fixed an issue where queries involving filters on TIME_FLOOR could encounter ClassCastException when comparing RangeValue in CombineAndSimplifyBounds#15778
Tombstone segments now have 0 core partitions. This means they can be dropped or removed independently without affecting availability of other appended segments in the same co-partition space. Prior to this change, removing tombstones with 1 core partition that contained appended segments in the partition space could make the appended segments unavailable.
# Clean up duty for non-overlapping eternity tombstones
Added MarkEternityTombstonesAsUnused to clean up non-overlapping eternity tombstones—tombstone segments that either start at -INF or end at INF and don't overlap with any overshadowed used segments in the datasource.
Also added a new metric segment/unneededEternityTombstone/count to count the number of dropped non-overshadowed eternity tombstones per datasource.
# Enhanced the JSON parser unexpected token logging
The JSON parser unexpected token error now includes the context of the expected VALUE_STRING token. This makes it easier to track mesh/proxy network error messages and to avoid unnecessary research into Druid server rest endpoint responses.
Fixed an issue where the Broker would return an HTTP 400 status code instead of 503 when a Coordinator was temporarily unavailable, such as during a rolling upgrade #15756
Added user identity to Router query request logs #15126
Improved process to retrieve segments from metadata store by retrieving segments in batches #15305
Improved logging messages when skipping auto-compaction for a data source #15460
Improved compaction by modifying the segment iterator to skip intervals without data #15676
Increased _acceptQueueSize based on value of net.core.somaxconn#15596
Optimized the process to mark segments as unused #15352
Updated auto-compaction to preserve spatial dimensions rather than rewrite them into regular string dimensions #15321
Improved performance of HLL sketch merge aggregators #15162
Updated histogram post-aggregators for Quantiles and KLL sketches for when all values in the sketch are equal. Previously these queries fail but now return [N, 0, 0, ...], where N is the number of values in the sketch, and the length of the list is equal to the value assigned to numBins#15381
Added a config option to the Kafka emitter that lets you mask sensitive values for the Kafka producer. This feature is optional and will not affect prior configs for the emitter #15485
Resolved InterruptedException logging in ingestion task logs #15519
You can configure the pushgateway strategy to delete metrics from Prometheus push gateway on task shutdown using the following Prometheus emitter configurations:
druid.emitter.prometheus.deletePushGatewayMetricsOnShutdown: When set to true, peon tasks delete metrics from the Prometheus push gateway on task shutdown. Default value is false.
druid.emitter.prometheus.waitForShutdownDelay: Time in milliseconds to wait for peon tasks to delete metrics from pushgateway on shutdown. Applicable only when druid.emitter.prometheus.deletePushGatewayMetricsOnShutdown is set to true. Default value is none, meaning that there is no delay between peon task shutdown and metrics deletion from the push gateway.
Added a parameter snapshotTime to the iceberg input source spec that allows the user to ingest data files associated with the most recent snapshot. This helps the user ingest data based on older snapshots by specifying the associated snapshot time #15348
Added a new Iceberg ingestion filter of type range to filter on ranges of column values #15782
Fixed a typo in the Iceberg warehouse path for s3 #15823
The equality filter on mixed type auto columns that contain arrays must now be filtered as their presenting type. This means that if any rows are arrays (for example, the segment metadata and information_schema reports the type as some array type), then the native queries must also filter as if they are some array type.
This change impacts mixed type auto columns that contain both scalars and arrays. It doesn't impact SQL, which already has this limitation due to how the type presents itself.
# Console automatically sets arrayIngestMode for MSQ queries
Druid console now configures the arrayIngestMode parameter in the data loading flow, and its value can persist across the SQL tab unless manually updated. Therefore, when loading multi-value dimensions or arrays in the Druid web console, note the value of the arrayIngestMode parameter, to prevent mixing MVDs and Arrays in the same column of a data source accidentally.
# Improved concurrent append and replace (experimental)
You no longer have to manually determine the task lock type for concurrent append and replace (experimental) with the taskLockType task context. Instead, Druid can now determine it automatically for you. You can use the context parameter "useConcurrentLocks": true for individual tasks and datasources or enable concurrent append and replace at a cluster level using druid.indexer.task.default.context.
The MSQ task engine now allows empty ingest queries by default. For queries that don't generate any output rows, the MSQ task engine reports zero values for numTotalRows and totalSizeInBytes instead of null. Previously, ingest queries that produced no data would fail with the InsertCannotBeEmpty MSQ fault.
To revert to the original behavior, set the MSQ query parameter failOnEmptyInsert to true.
# Enabled query request queuing by default when total laning is turned on
When query scheduler threads are less than server HTTP threads, total laning turns on.
This reserves some HTTP threads for non-query requests such as health checks.
The total laning previously would reject any query request that exceeds the lane capacity.
Now, excess requests will instead be queued with a timeout equal to MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout).
When the requested granularity is a month or larger but a segment can't be allocated, Druid resorts to day partitioning.
Unless explicitly specified, Druid skips week-granularity segments for data partitioning because these segments don't align with the end of the month or more coarse-grained intervals.
Updated pac4j-oidc java security library version to 4.5.7 #15522
Updated io.kubernetes.client-java version to 19.0.0 and docker-java-bom to 3.3.4 #15449
Updated core Apache Kafka dependencies to 3.6.1 #15539
Updated and pruned multiple dependencies for the web console, including dropping Babel. As a result, Internet Explorer 11 is no longer supported with the web console #15487
Updated Apache Zookeeper to 3.8.3 from 3.5.10 #15477
Apache Druid 29.0.0 contains over 350 new features, bug fixes, performance enhancements, documentation improvements, and additional test coverage from 67 contributors.
See the complete set of changes for additional details, including bug fixes.
Review the upgrade notes before you upgrade to Druid 29.0.0.
If you are upgrading across multiple versions, see the Upgrade notes page, which lists upgrade notes for the most recent Druid versions.
# Important features, changes, and deprecations
This section contains important information about new and existing features.
# MSQ export statements (experimental)
Druid 29.0.0 adds experimental support for export statements to the MSQ task engine. This allows query tasks to write data to an external destination through the
EXTERN
function.#15689
# SQL PIVOT and UNPIVOT (experimental)
Druid 29.0.0 adds experimental support for the SQL PIVOT and UNPIVOT operators.
The PIVOT operator carries out an aggregation and transforms rows into columns in the output. The following is the general syntax for the PIVOT operator:
PIVOT (aggregation_function(column_to_aggregate) FOR column_with_values_to_pivot IN (pivoted_column1 [, pivoted_column2 ...]) )
The UNPIVOT operator transforms existing column values into rows. The following is the general syntax for the UNPIVOT operator:
UNPIVOT (values_column FOR names_column IN (unpivoted_column1 [, unpivoted_column2 ... ]) )
# Range support in window functions (experimental)
Window functions (experimental) now support ranges where both endpoints are unbounded or are the current row. Ranges work in strict mode, which means that Druid will fail queries that aren't supported. You can turn off strict mode for ranges by setting the context parameter
windowingStrictValidation
tofalse
.The following example shows a window expression with RANGE frame specifications:
#15703 #15746
# Improved INNER joins
Druid now supports arbitrary join conditions for INNER join. Any sub-conditions that can't be evaluated as part of the join are converted to a post-join filter. Improved join capabilities allow Druid to more effectively support applications like Tableau.
#15302
# Improved concurrent append and replace (experimental)
You no longer have to manually determine the task lock type for concurrent append and replace (experimental) with the
taskLockType
task context. Instead, Druid can now determine it automatically for you. You can use the context parameter"useConcurrentLocks": true
for individual tasks and datasources or enable concurrent append and replace at a cluster level usingdruid.indexer.task.default.context
.#15684
# First and last aggregators for double, float, and long data types
Druid now supports first and last aggregators for the double, float, and long types in native and MSQ ingestion spec and MSQ queries. Previously, they were only supported for native queries. For more information, see First and last aggregators.
#14462
Additionally, the following functions can now return numeric values:
You can use these functions as aggregators at ingestion time.
#15607
# Support for logging audit events
Added support for logging audit events and improved coverage of audited REST API endpoints.
To enable logging audit events, set config
druid.audit.manager.type
tolog
in both the Coordinator and Overlord or incommon.runtime.properties
. When you setdruid.audit.manager.type
tosql
, audit events are persisted to metadata store.In both cases, Druid audits the following events:
#15480 #15653
Also fixed an issue with the basic auth integration test by not persisting logs to the database.
#15561
# Enabled empty ingest queries
The MSQ task engine now allows empty ingest queries by default. Previously, ingest queries that produced no data would fail with the
InsertCannotBeEmpty
MSQ fault.For more information, see Empty ingest queries in the upgrade notes.
#15674 #15495
In the web console, you can use a toggle to control whether an ingestion fails if the ingestion query produces no data.
#15627
# MSQ support for Google Cloud Storage
The MSQ task engine now supports Google Cloud Storage (GCS). You can use durable storage with GCS. See Durable storage configurations for more information.
#15398
# Experimental extensions
Druid 29.0.0 adds the following extensions.
# DDSketch
A new DDSketch extension is available as a community contribution. The DDSketch extension (
druid-ddsketch
) provides support for approximate quantile queries using the DDSketch library.#15049
# Spectator histogram
A new histogram extension is available as a community contribution. The Spectator-based histogram extension (
druid-spectator-histogram
) provides approximate histogram aggregators and percentile post-aggregators based on Spectator fixed-bucket histograms.#15340
# Delta Lake
A new Delta Lake extension is available as a community contribution. The Delta Lake extension (
druid-deltalake-extensions
) lets you use the Delta Lake input source to ingest data stored in a Delta Lake table into Apache Druid.#15755
# Functional area and related changes
This section contains detailed release notes separated by areas.
# Web console
# Support for array types
Added support for array types for all the ingestion wizards.
When loading multi-value dimensions or arrays using Druid's Query console, note the value of the
arrayIngestMode
parameter. Druid now configures thearrayIngestMode
parameter in the data loading flow, and its value can persist across the SQL tab, even if you execute unrelated Data Manipulation Language (DML) operations within the same tab.#15588
# File inputs for query detail archive
The Load query detail archive now supports loading queries by selecting a JSON file directly or dragging the file into the dialog.
#15632
# Improved lookup dialog
The lookup dialog in the web console now includes following optional fields. See JDBC lookup for more information.
#15472
# Improved time chart brush and added auto-granularity
Improved the web console Explore view as follows:
#14990
# Other web console improvements
EXPLAIN PLAN
queries in the workbench and run them individually #15570waitUntilSegmentLoad
would always be set totrue
even if explicitly set tofalse
#15781# General ingestion
# Added system fields to input sources
Added the option to return system fields when defining an input source. This allows for ingestion of metadata, such as an S3 object's URI.
#15276
# Changed how Druid allocates weekly segments
When the requested granularity is a month or larger but a segment can't be allocated, Druid resorts to day partitioning.
Unless explicitly specified, Druid skips week-granularity segments for data partitioning because these segments don't align with the end of the month or more coarse-grained intervals.
#15589
# Changed how empty or null array columns are stored
Columns ingested with the
auto
column indexer that contain only empty or null containing arrays are now stored asARRAY<LONG>
instead ofCOMPLEX<json>
.#15505
# Enabled skipping compaction for datasources with partial-eternity segments
Druid now skips compaction for datasources with segments that have an interval start or end which coincides with Eternity interval end-points.
#15542
# Kill task improvements
Improved kill tasks as follows:
maxUsedStatusLastUpdatedTime
. When set to a date time, the kill task considers segments in the specified interval marked as unused no later than this time. The default behavior is to kill all unused segments in the interval regardless of the time when segments where marked as unused.#15710
# Segment allocation improvements
Improved segment allocation as follows:
# Other ingestion improvements
evalDimension
method in theRowFunction
interface #15452taskQueue
reachesmaxSize
#15409hasMultipleValues = UNKNOWN
#15300IOException
obfuscated S3 exceptions #15238[1000, 9999]
#15608IncrementalIndex
andOnHeapIncrementalIndex
by removing some parameters #15448OnheapIncrementalIndex
to no longer try to offer a thread-safe "add" method #15697# SQL-based ingestion
# Added
castToType
parameterAdded optional
castToType
parameter toauto
column schema.#15417
# Improved the EXTEND operator
The EXTEND operator now supports the following array types:
VARCHAR ARRAY
,BIGINT ARRAY
,FLOAT ARRAY
, andDOUBLE ARRAY
.The following example shows an extern input with Druid native input types
ARRAY<STRING>
,ARRAY<LONG>
andSTRING
:#15458
# Improved tombstone generation to honor granularity specified in a
REPLACE
queryMSQ
REPLACE
queries now generate tombstone segments honoring the segment granularity specified in the query rather than generating irregular tombstones. If a query generates more than 5000 tombstones, Druid returns an MSQTooManyBucketsFault
error, similar to the behavior with data segments.#15243
# Improved hash joins using filters
Improved consistency of JOIN behavior for queries using either the native or MSQ task engine to prune based on base (left-hand side) columns only.
#15299
# Configurable page size limit
You can now limit the pages size for results of SELECT queries run using the MSQ task engine. See
rowsPerPage
in the SQL-based ingestion reference.# Streaming ingestion
# Improved Amazon Kinesis automatic reset
Changed Amazon Kinesis automatic reset behavior to only reset the checkpoints for partitions where sequence numbers are unavailable.
#15338
# Querying
# Added IPv6_MATCH SQL function
Added IPv6_MATCH SQL function for matching IPv6 addresses in a subnet:
#15212
# Added JSON_QUERY_ARRAY function
Added JSON_QUERY_ARRAY which is similar to JSON_QUERY except the return type is always
ARRAY<COMPLEX<json>>
instead ofCOMPLEX<json>
. Essentially, this function allows extracting arrays of objects from nested data and performing operations such as UNNEST, ARRAY_LENGTH, ARRAY_SLICE, or any other available ARRAY operations.#15521
# Added support for
aggregateMultipleValues
Improved the
ANY_VALUE(expr)
function to support the boolean optionaggregateMultipleValues
. TheaggregateMultipleValues
option is enabled by default. When you run ANY_VALUE on an MVD, the function returns the stringified array. IfaggregateMultipleValues
is set tofalse
, ANY_VALUE returns the first value instead.#15434
# Added native
arrayContainsElement
filterAdded native
arrayContainsElement
filter to improve performance when using ARRAY_CONTAINS on array columns.#15366 #15455
Also ARRAY_OVERLAP now uses the
arrayContainsElement
filter when filtering ARRAY typed columns, so that it can use indexes like ARRAY_CONTAINS.#15451
# Added index support
Improved nested JSON columns as follows:
ValueIndexes
andArrayElementIndexes
for nested arrays.ValueIndexes
for nested long and double columns.#15752
# Improved
timestamp_extract
functionThe
timestamp_extract(expr, unit, [timezone])
Druid native query function now supports dynamic values.#15586
# Improved JSON_VALUE and JSON_QUERY
Added support for using expressions to compute the JSON path argument for JSON_VALUE and JSON_QUERY functions dynamically. The JSON path argument doesn't have to be a constant anymore.
#15320
# Improved filtering performance for lookups
Enhanced filtering performance for lookups as follows:
sqlReverseLookupThreshold
SQL query context parameter.sqlReverseLookupThreshold
represents the maximum size of an IN filter that will be created as part of lookup reversal #15832# Enabled query request queuing by default when total laning is turned on
When query scheduler threads are less than server HTTP threads, total laning turns on.
This reserves some HTTP threads for non-query requests such as health checks.
The total laning previously would reject any query request that exceeds the lane capacity.
Now, excess requests will instead be queued with a timeout equal to
MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)
.#15440
# Other querying improvements
NullValueIndex
to be used byNullFilter
. This improvement should speed upis null
andis not null
filters on JSON columns #15687ExpressionPostAggregator
to handle ARRAY types output by the grouping engine #15543nvl
not clearing out stale null vectors for vector expression processing #15587JSON_QUERY
#15643ClassCastException
when comparingRangeValue
inCombineAndSimplifyBounds
#15778# Data management
# Changed
numCorePartitions
to 0 for tombstonesTombstone segments now have 0 core partitions. This means they can be dropped or removed independently without affecting availability of other appended segments in the same co-partition space. Prior to this change, removing tombstones with 1 core partition that contained appended segments in the partition space could make the appended segments unavailable.
#15379
# Clean up duty for non-overlapping eternity tombstones
Added
MarkEternityTombstonesAsUnused
to clean up non-overlapping eternity tombstones—tombstone segments that either start at-INF
or end atINF
and don't overlap with any overshadowed used segments in the datasource.Also added a new metric
segment/unneededEternityTombstone/count
to count the number of dropped non-overshadowed eternity tombstones per datasource.#15281
# Enabled skipping compaction for datasources with partial-eternity segments
Druid now skips compaction for datasources with segments that have their interval start or end coinciding with Eternity interval end-points.
#15542
# Enhanced the JSON parser unexpected token logging
The JSON parser unexpected token error now includes the context of the expected
VALUE_STRING
token. This makes it easier to track mesh/proxy network error messages and to avoid unnecessary research into Druid server rest endpoint responses.#15176
# Other data management improvements
400
status code instead of503
when a Coordinator was temporarily unavailable, such as during a rolling upgrade #15756_acceptQueueSize
based on value ofnet.core.somaxconn
#15596# Metrics and monitoring
serviceName
forsegment/count
metric to match the configured metric name within the StatsD emitter #15347# Extensions
# Basic security improvements
The computed hash values of passwords are now cached for the
druid-basic-security
extension to boost authentication validator performance.#15648
# DataSketches improvements
[N, 0, 0, ...]
, where N is the number of values in the sketch, and the length of the list is equal to the value assigned tonumBins
#15381# Microsoft Azure improvements
batchDeleteFiles
method in Azure Storage #15730# Kubernetes improvements
# Kafka emitter improvements
InterruptedException
logging in ingestion task logs #15519# Prometheus emitter improvements
You can configure the
pushgateway
strategy to delete metrics from Prometheus push gateway on task shutdown using the following Prometheus emitter configurations:druid.emitter.prometheus.deletePushGatewayMetricsOnShutdown
: When set to true, peon tasks delete metrics from the Prometheus push gateway on task shutdown. Default value is false.druid.emitter.prometheus.waitForShutdownDelay
: Time in milliseconds to wait for peon tasks to delete metrics frompushgateway
on shutdown. Applicable only whendruid.emitter.prometheus.deletePushGatewayMetricsOnShutdown
is set to true. Default value is none, meaning that there is no delay between peon task shutdown and metrics deletion from the push gateway.#14935
# Iceberg improvements
Improved the Iceberg extension as follows:
snapshotTime
to the iceberg input source spec that allows the user to ingest data files associated with the most recent snapshot. This helps the user ingest data based on older snapshots by specifying the associated snapshot time #15348range
to filter on ranges of column values #15782# Upgrade notes and incompatible changes
# Upgrade notes
# Changed
equals
filter for native queriesThe equality filter on mixed type
auto
columns that contain arrays must now be filtered as their presenting type. This means that if any rows are arrays (for example, the segment metadata andinformation_schema
reports the type as some array type), then the native queries must also filter as if they are some array type.This change impacts mixed type
auto
columns that contain both scalars and arrays. It doesn't impact SQL, which already has this limitation due to how the type presents itself.#15503
# Console automatically sets
arrayIngestMode
for MSQ queriesDruid console now configures the
arrayIngestMode
parameter in the data loading flow, and its value can persist across the SQL tab unless manually updated. Therefore, when loading multi-value dimensions or arrays in the Druid web console, note the value of the arrayIngestMode parameter, to prevent mixing MVDs and Arrays in the same column of a data source accidentally.#15588
# Improved concurrent append and replace (experimental)
You no longer have to manually determine the task lock type for concurrent append and replace (experimental) with the
taskLockType
task context. Instead, Druid can now determine it automatically for you. You can use the context parameter"useConcurrentLocks": true
for individual tasks and datasources or enable concurrent append and replace at a cluster level usingdruid.indexer.task.default.context
.#15684
# Enabled empty ingest queries
The MSQ task engine now allows empty ingest queries by default. For queries that don't generate any output rows, the MSQ task engine reports zero values for
numTotalRows
andtotalSizeInBytes
instead of null. Previously, ingest queries that produced no data would fail with theInsertCannotBeEmpty
MSQ fault.To revert to the original behavior, set the MSQ query parameter
failOnEmptyInsert
totrue
.#15495 #15674
# Enabled query request queuing by default when total laning is turned on
When query scheduler threads are less than server HTTP threads, total laning turns on.
This reserves some HTTP threads for non-query requests such as health checks.
The total laning previously would reject any query request that exceeds the lane capacity.
Now, excess requests will instead be queued with a timeout equal to
MIN(Integer.MAX_VALUE, druid.server.http.maxQueryTimeout)
.#15440
# Changed how empty or null array columns are stored
Columns ingested with the auto column indexer that contain only empty or null arrays are now stored as
ARRAY<LONG\>
instead ofCOMPLEX<json\>
.#15505
# Changed how Druid allocates weekly segments
When the requested granularity is a month or larger but a segment can't be allocated, Druid resorts to day partitioning.
Unless explicitly specified, Druid skips week-granularity segments for data partitioning because these segments don't align with the end of the month or more coarse-grained intervals.
#15589
# Removed the
auto
search strategyRemoved the
auto
search strategy from the native search query. SettingsearchStrategy
toauto
is now equivalent touseIndexes
.#15550
# Developer notes
# Improved
InDimFilter
reverse-lookup optimizationThis improvement includes the following changes:
mayIncludeUnknown
parameter toDimFilter#optimize
.InDimFilter#optimizeLookup
to handlemayIncludeUnknown
and perform reverse lookups in a wider range of cases.unapply
method inLookupExtractor
protected and relocated callers tounapplyAll
.If your extensions provide a
DimFilter
, you may need to rebuild them to ensure compatibility with this release.#15611
# Other developer improvements
# Web console logging
The web console now logs request errors in end-to-end tests to help with debugging.
#15483
# Dependency updates
The following dependencies have been updated:
Added
chronoshift
as a dependency #14990Added
gson
topom.xml
#15488Updated Confluent's dependencies to 6.2.12 #15441
Excluded
jackson-jaxrs
fromranger-plugin-common
, which isn't required, to address CVEs #15481Updated AWS SDK version to
1.12.638
#15814Updated Avro to 1.11.3 #15419
Updated Ranger libraries to the newest available version #15363
Updated the iceberg core version to 1.4.1 #15348
Reduced dependency footprint for the iceberg extension #15280
Updated
com.github.eirslett
version to 1.15.0 #15556Updated multiple webpack dependencies:
webpack
to 5.89.0webpack-bundle-analyzer
to 4.10.1webpack-cli
to 5.1.4webpack-dev-server
to 4.15.1#15555
Updated
pac4j-oidc
java security library version to 4.5.7 #15522Updated
io.kubernetes.client-java
version to 19.0.0 anddocker-java-bom
to 3.3.4 #15449Updated core Apache Kafka dependencies to 3.6.1 #15539
Updated and pruned multiple dependencies for the web console, including dropping Babel. As a result, Internet Explorer 11 is no longer supported with the web console #15487
Updated Apache Zookeeper to 3.8.3 from 3.5.10 #15477
Updated Gauva to 32.0.1 from 31.1 #15482
Updated multiple dependencies to address CVEs:
dropwizard-metrics
to 4.2.22 to address GHSA-mm8h-8587-p46h incom.rabbitmq:amqp-client
ant
to 1.10.14 to resolve GHSA-f62v-xpxf-3v68, GHSA-4p6w-m9wc-c9c9, GHSA-q5r4-cfpx-h6fh, and GHSA-5v34-g2px-j4fwcomomons-compress
to 1.24.0 to resolve GHSA-cgwf-w82q-5jrrjose4j
to 0.9.3 to resolve GHSA-7g24-qg88-p43q and GHSA-jgvc-jfgh-rjvvkotlin-stdlib
to 1.6.0 to resolve GHSA-cqj8-47ch-rvvq and CVE-2022-24329#15464
Updated Jackson to version 2.12.7.1 to address CVE-2022-42003 and CVE-2022-42004 which affects
jackson-databind
#15461Updated
com.google.code.gson:gson
from 2.2.4 to 2.10.1 since 2.2.4 is affected by CVE-2022-25647 #15461Updated Jedis to version 5.0.2 #15344
Updated
commons-codec:commons-codec
from 1.13 to 1.16.0 #14819Updated Nimbus version to
8.22.1
#15753# Credits
@17px
@317brian
@a2l007
@abhishekagarwal87
@abhishekrb19
@adarshsanjeev
@AlbericByte
@aleksi75
@AmatyaAvadhanula
@ankit0811
@aruraghuwanshi
@BartMiki
@benhopp
@bsyk
@clintropolis
@cristian-popa
@cryptoe
@dchristle
@dependabot[bot]
@ektravel
@fectrain
@findingrish
@gargvishesh
@georgew5656
@gianm
@hfukada
@hofi1
@HudsonShi
@janjwerner-confluent
@jon-wei
@kaisun2000
@KeerthanaSrikanth
@kfaraz
@kgyrtkirk
@krishnanand5
@LakshSingla
@legoscia
@lkm
@lorem--ipsum
@maytasm
@nasuiyile
@nozjkoitop
@oo007
@pagrawal10
@Pankaj260100
@pranavbhole
@rash67
@sb89594
@sekikn
@sergioferragut
@somu-imply
@suneet-s
@techdocsmith
@tejaswini-imply
@TestBoost
@TSFenwick
@Tts-233
@vinlee19
@vivek807
@vogievetsky
@vtlim
@writer-jill
@xvrl
@yashdeep97
@YongGang
@yuanlihan
@zachjsh
The text was updated successfully, but these errors were encountered: