Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segment metadata #456

Merged
merged 2 commits into from
May 19, 2023
Merged

Conversation

bsyk
Copy link
Contributor

@bsyk bsyk commented May 19, 2023

There are 2 issues with Druid segment metadata

  1. If a column type is changed, the entry for the column default to a string type and an errorMessage is populated. These column entries cannot be used. Since we treat all string columns as dimension, a metric that changes type gets listed as a dimension instead of a metric and as such cannot be queried.
  2. The HLLSketchBuilder ingestion aggregator initially reports column type of HLLSketchBuilder but as soon as that is persisted into an actual segment it becomes HLLSketch. This results in the same symptom as (1) since the type appears to have changed.

We encounter problem (1) when someone accidentally adds a new metric as the wrong type and needs to fix it. e.g. as a spectatorHistogramTimer when it was supposed to be a spectatorHistogramDistribution. Today we have to run a hadoop job to restate that column under the correct type.

We encounter problem (2) when ingesting HLLSketches from raw values using realtime ingestion. This does not appear when using batch ingestion since the segment and metadata is already finalized once the cluster loads the segment.

bsyk added 2 commits May 19, 2023 12:54
These are typically where a metric type has changed between segments. The column type defaults to string, but provides a non-null errorMessage.
Previously we would assume this was a dimension since it is reported as string. This will now omit it from visibility.
Requires using lenient aggregator merge if we want to accommodate types changing
@brharrington brharrington merged commit 38c6aae into Netflix-Skunkworks:main May 19, 2023
@bsyk bsyk deleted the segment-metadata branch May 19, 2023 20:43
manolama pushed a commit to manolama/iep-apps that referenced this pull request Oct 25, 2023
There are 2 issues with Druid segment metadata

1.  If a column type is changed, the entry for the column default to a string
   type and an errorMessage is populated. These column entries cannot be
   used. Since we treat all string columns as dimension, a metric that
   changes type gets listed as a dimension instead of a metric and as such
   cannot be queried.
2. The HLLSketchBuilder ingestion aggregator initially reports column type
   of HLLSketchBuilder but as soon as that is persisted into an actual
   segment it becomes HLLSketch. This results in the same symptom as
   (1) since the type appears to have changed.

We encounter problem (1) when someone accidentally adds a new metric
as the wrong type and needs to fix it. e.g. as a spectatorHistogramTimer
when it was supposed to be a spectatorHistogramDistribution. Today we
have to run a hadoop job to restate that column under the correct type.

We encounter problem (2) when ingesting HLLSketches from raw values
using realtime ingestion. This does not appear when using batch ingestion
since the segment and metadata is already finalized once the cluster loads
the segment.

Changes:

1. Filter out columns that report an error. These are typically where a metric
   type has changed between segments. The column type defaults to string,
   but provides a non-null errorMessage. Previously we would assume this
   was a dimension since it is reported as string. This will now omit it from
   visibility.
2. Use aggregators to define metrics rather than columns. Requires using
   lenient aggregator merge if we want to accommodate types changing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants