Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Support for extracting OS attributes from UserAgent #35458

Open
rogercoll opened this issue Sep 27, 2024 · 5 comments
Open

[pkg/ottl] Support for extracting OS attributes from UserAgent #35458

rogercoll opened this issue Sep 27, 2024 · 5 comments
Labels
enhancement New feature or request pkg/ottl priority:p3 Lowest waiting-for:semantic-conventions Waiting on something on semantic-conventions to be stabilized

Comments

@rogercoll
Copy link
Contributor

Component(s)

pkg/ottl

Is your feature request related to a problem? Please describe.

UserAgent semantic convention attributes can be extracted using the OTTL UserAgent function: https://github.com/pchila/opentelemetry-collector-contrib/tree/7da12e47eb9cf719aa593f9935bce9ba72844703/pkg/ottl/ottlfuncs#useragent (implemented in #34172)

The current extracted attributes are user_agent.name, user_agent.version and user_agent.original. But more information can be extracted from the user_agent.original string, like the OS related information.

Semantic conventions proposal: open-telemetry/semantic-conventions#1433
Current Elastic ECS user_agent OS attributes: https://www.elastic.co/guide/en/ecs/current/ecs-user_agent.html#_field_reuse_30

Describe the solution you'd like

Extract additional fields from the user_agent:

  • user_agent.os.type
  • user_agent.os.name
  • user_agent.os.version
  • user_agent.os.build_id
  • user_agent.os.description

Describe alternatives you've considered

No response

Additional context

This functionally would be very helpful for logs/metrics analytics, for example, a Nginx Ingress Controller log record contains the user-agent, this function could be configured in the collector to extract the OS information from all Nginx logs. Dashboards and alerts can be built over this information; OS with most errors? Which are the most common OS versions? etc.

@rogercoll rogercoll added enhancement New feature or request needs triage New item requiring triage labels Sep 27, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ioandr
Copy link

ioandr commented Oct 6, 2024

Hi @rogercoll, I took a quick look at this.

It looks like the UA parser provides a function to parse Os info from a user agent string.

In https://github.com/ua-parser/uap-core/blob/master/tests/test_os.yaml I found various user agent strings; however all expected test results consist of family, major, minor, patch and patch_minor and not type, name, version, build_id and description. So there is no 1:1 mapping between the two.

As a first iteration we could map:

  • user_agent.os.name to Os.family
  • user_agent.os.version to {Os.major}.{Os.minor}.{Os.patch}.{Os.patch_minor}

Maybe we could also set user_agent.os.type by performing a lookup based on Os.family. e.g., Android -> Linux, WatchOS -> iOS, etc.

What about the rest of the fields you proposed though?

@rogercoll
Copy link
Contributor Author

@ioandr Thanks for taking a look into this. Based on your research, there are three attributes which we cannot map 1:1 with the UA package parser function. I would purpose the following:

  • user_agent.os.type: Same as you shared, lookup map based on os.familiy
  • user_agent.os.build_id: {Os.patch_minor}?
  • user_agent.os.description: The whole OS string included in the User Agent. For example: "Mozilla /5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0"X11; Linux x86_64; rv:127.0

Although I would not make the previous a blocker, if is not clear/feasible their extractions, I would start with the 1:1 mapping with the UA package.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 12, 2024
@ioandr
Copy link

ioandr commented Oct 15, 2024

Thanks for the follow-up @rogercoll, I will take a stab on this and open a PR shortly.

@ioandr
Copy link

ioandr commented Oct 20, 2024

Hi @rogercoll I opened a PR that adds name and version as discussed above. I also updated existing test cases as needed.

For the time being I didn't add the extra fields for the reasons below:

  1. type: I couldn't find an exhaustive, trustworthy mapping to go from OS family to OS type. Let's tackle this in the next iteration
  2. build_id: I am not sure mapping this to patch_minor does not look accurate after searching on the internet. Build ID is mostly common for Windows (e.g. 22621) and MacOS (e.g., 20B29)
  3. description: it seems that the UA parser does not provide a function to return the "original OS string". This probably requires some regex matching which might be tricky to get right for all user agent strings

Other than these, please let me know if I need to update any OTEL collector documentation, I couldn't find any relevant places other than the Semver documentation:

https://opentelemetry.io/docs/specs/semconv/attributes-registry/os/

@TylerHelmuth TylerHelmuth added the priority:p3 Lowest label Nov 15, 2024
@mx-psi mx-psi added the waiting-for:semantic-conventions Waiting on something on semantic-conventions to be stabilized label Nov 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pkg/ottl priority:p3 Lowest waiting-for:semantic-conventions Waiting on something on semantic-conventions to be stabilized
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants