Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing docs for lookup based task context properties #17562

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

a2l007
Copy link
Contributor

@a2l007 a2l007 commented Dec 12, 2024

Adds documentation for task context properties: lookupLoadingMode and lookupsToLoad.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.

@a2l007
Copy link
Contributor Author

a2l007 commented Dec 12, 2024

@Akshat-Jain Do you mind taking a look?

docs/ingestion/tasks.md Outdated Show resolved Hide resolved
@@ -470,6 +470,8 @@ The following parameters apply to all task types.
|`storeEmptyColumns`|Enables the task to store empty columns during ingestion. When `true`, Druid stores every column specified in the [`dimensionsSpec`](ingestion-spec.md#dimensionsspec). When `false`, Druid SQL queries referencing empty columns will fail. If you intend to leave `storeEmptyColumns` disabled, you should either ingest dummy data for empty columns or else not query on empty columns.<br/><br/>When set in the task context, `storeEmptyColumns` overrides the system property [`druid.indexer.task.storeEmptyColumns`](../configuration/index.md#additional-peon-configuration).|`true`|
|`taskLockTimeout`|Task lock timeout in milliseconds. For more details, see [Locking](#locking).<br/><br/>When a task acquires a lock, it sends a request via HTTP and awaits until it receives a response containing the lock acquisition result. As a result, an HTTP timeout error can occur if `taskLockTimeout` is greater than `druid.server.http.maxIdleTime` of Overlords.|300000|
|`useLineageBasedSegmentAllocation`|Enables the new lineage-based segment allocation protocol for the native Parallel task with dynamic partitioning. This option should be off during the replacing rolling upgrade from one of the Druid versions between 0.19 and 0.21 to Druid 0.22 or higher. Once the upgrade is done, it must be set to `true` to ensure data correctness.|`false` in 0.21 or earlier, `true` in 0.22 or later|
|`lookupLoadingMode`|Controls the lookup loading behavior in tasks. This property supports three values: `ALL` mode loads all the lookups, `NONE` mode does not load any lookups and `ONLY_REQUIRED` mode loads the lookups specified with context key `lookupsToLoad`.|`ALL`|
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call out that the user must not specify this context parameter for MSQ tasks and kill tasks
as the system computed value is always the right choice:

  • MSQControllerTask - hardcoded to NONE, cannot be overridden
  • MSQWorkerTask - ONLY_REQUIRED, lookupsToLoad are identified by the controller task by parsing the SQL
  • kill task - hardcoded to NONE, cannot be overridden

In the future, we might implement auto-detection of required lookups for batch ingest, streaming ingest and compact tasks too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kfaraz , I've added more details here based on your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants