Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LogsDB host.name mapping conflicts #118686

Open
salvatore-campagna opened this issue Dec 13, 2024 · 1 comment
Open

LogsDB host.name mapping conflicts #118686

salvatore-campagna opened this issue Dec 13, 2024 · 1 comment

Comments

@salvatore-campagna
Copy link
Contributor

salvatore-campagna commented Dec 13, 2024

Description

In Elasticsearch, when using LogsDB, data streams are automatically sorted by host.name and @timestamp. To enable index sorting by default in LogsDB and avoid issues, Elasticsearch injects a default mapping for the host.name field as a keyword. However, this can cause compatibility issues when customers upgrade to LogsDB if the existing mappings for host.name are defined in a way that is incompatible with the default keyword type expected by LogsDB. Moreover, adoption of LogsDB might happen automatically for data streams matching the logs-*-* pattern with customers being unaware of the possible mapping conflicts.

Specifically, users who already have mappings for host.name with non-keyword types may encounter errors during the mapping update and/or template composition process. This happens for users using custom mappings that are not compliant with ECS. Such errors arise because Elasticsearch cannot merge incompatible field types. As a result, switching to LogsDB during an upgrade might result in mapping rejection issues, preventing a smooth transition to the new index mode for logs.

Possible solutions include:

  1. Do not inject the host.name field and sort only by @timestamp (which already exists as a date field in data streams). This would avoid the conflict, but could result in suboptimal index sorting, affecting query performance and compression efficiency.
  2. Skip the use of logsdb index mode if a conflicting mapping is detected during index creation. This would prevent logs from benefiting from the optimisations offered by LogsDB.
  3. Make the host.name field a field that we sort on as an "optional" sort field (only if it does not cause mapping issues). Similarly to the first option this might result in possible suboptimal index sorting.

It is worth noting that in all cases the solution applies, anyway, on a per-data-stream basis, not affecting data streams with compatible mappings. As a result, the issue is confined to the subset of logs data streams including the incompatible host.name mapping only.

The second option, anyway, offers the advantage of allowing the user to adopt LogsDB at a later moment after fixing their mapping without disrupting the upgrade process and without preventing data stream rollover. It requires, anyway, a mechanism to make users aware of the issue so that they can act accordingly.

Another thing to keep in mind is that LogsDB adoption requires a rollover operation, which might result in the mapping issue surfacing days, weeks or even months after upgrading, depending on the data stream specific rollover policy.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants