Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error event logs should include plugin id #4567

Open
DavidKnacker opened this issue Jul 26, 2024 · 20 comments
Open

Error event logs should include plugin id #4567

DavidKnacker opened this issue Jul 26, 2024 · 20 comments
Labels
enhancement Feature request or improve operations

Comments

@DavidKnacker
Copy link

Describe the bug

The @id plugin parameter is used to give each plugin its unique id. This id should be included in logs generated by said plugin.
This seems to work inconsistent between different plugin types.
I tested the json filter plugin (for which the id is not shown) and the loki output plugin (which shows it)

working example:

<source>
  @type sample
  @id id_sample_too_long
  sample {"data":"this log is way to long to fit into one chunk"}
  tag sample.long
</source>

<match **>
  @type loki
  @id id_too_long_message
  url ...
  ...
<buffer>
  @type file
  chunk_limit_size 5
  ...
</buffer>
</match>

results in logs:

2024-07-26 13:28:37 +0000 [warn]: #0 [id_too_long_message] chunk bytes limit exceeds for an emitted event stream: 64bytes
2024-07-26 13:28:37.073654307 +0000 fluent.warn: {"message":"[id_too_long_message] chunk bytes limit exceeds for an emitted event stream: 64bytes"}

Both contain the "id_too_long_message" id of the plugin the error occured in.

But for the filter plugin:

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

The error:

2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

does not contain the id "id_not_json" of the filter plugin that caused the error.

To Reproduce

The config:

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>

results in:

2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

Expected behavior

The @id fiel, if set, shows up in logs prodcued by the plugin. In this example I would expect:

2024-07-26 13:21:57 +0000 [warn]: #0 [id_not_json] dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"# [id_not_json] <Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}

Your Environment

- Fluentd version: 1.17.0
- Package version:
- Operating system: Garden Linux 1443.9
- Kernel version: 6.6.35-cloud-arm64

Your Configuration

<source>
  @type sample
  @id id_sample_not_json
  sample {"data":"this is not json}"}
  tag sample.json
</source>

<filter sample.json>
  @type parser
  @id id_not_json
  key_name data
  <parse>
    @type json
  </parse>
</filter>


### Your Error Log

```shell
2024-07-26 13:21:57 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data 'this is not json}'" location=nil tag="sample.json" time=2024-07-26 13:21:57.079027108 +0000 record={"data"=>"this is not json}"}
2024-07-26 13:21:57.079191057 +0000 fluent.warn: {"error":"#<Fluent::Plugin::Parser::ParserError: pattern not matched with data 'this is not json}'>","location":null,"tag":"sample.json","time":1722000117,"record":{"data":"this is not json}"},"message":"dump an error event: error_class=Fluent::Plugin::Parser::ParserError error=\"pattern not matched with data 'this is not json}'\" location=nil tag=\"sample.json\" time=2024-07-26 13:21:57.079027108 +0000 record={\"data\"=>\"this is not json}\"}"}


### Additional context

_No response_
@Athishpranav2003
Copy link
Contributor

@DavidKnacker is the id missing even for logs generated by input plugin?

@DavidKnacker
Copy link
Author

@Athishpranav2003 I haven't checked it for error logs thrown by an input plugin yet.
But am I correct to assume that using @id to track errors should work for all kind of plugins? (or at least the ones that are shipped by default with fluentd?)

@Athishpranav2003
Copy link
Contributor

Ideally yes but since it's not behaving in that way I was checking what's the difference in underlying code
Will check it meanwhile

Just ping me later with the case of input plugin

@DavidKnacker
Copy link
Author

👍
will check the behavior for the input plugin tomorrow and report back

@kenhys
Copy link
Contributor

kenhys commented Jul 31, 2024

another scope of Idea: track plugin which emit error event in router side, then emit more user friendly error message.

@daipom
Copy link
Contributor

daipom commented Jul 31, 2024

Thanks for your report.
This is Fluentd's specification.

This dump an error event log is generated by RootAgent (Fluentd's common logic).
The filter_parser plugin just emits the error event to the router.
So, this log does not have its id.

def emit_error_event(tag, time, record, error)
error_info = {error: error, location: (error.backtrace ? error.backtrace.first : nil), tag: tag, time: time}
if @error_collector
# A record is not included in the logs because <@ERROR> handles it. This warn is for the notification
log.warn "send an error event to @ERROR:", error_info
@error_collector.emit(tag, time, record)
else
error_info[:record] = record
log.warn "dump an error event:", error_info
end
end

Although this is the current specification, it looks like not so good.
Maybe it could be improved somehow.

@daipom
Copy link
Contributor

daipom commented Jul 31, 2024

Although this is the current specification, it looks like not so good.
Maybe it could be improved somehow.

It would be better to include the id of the plugin that emits the error event.

It may be easy to manually include the id of filter_parser in the error info that it emits.

if @emit_invalid_record_to_error
router.emit_error_event(tag, time, record, Fluent::Plugin::Parser::ParserError.new("pattern not matched with data '#{raw_value}'"))
end

However, it is not so good because every plugin that emits error events needs to consider this issue.

As @kenhys says in #4567 (comment), it would be great if we could create a common mechanism for this.

another scope of Idea: track plugin which emit error event in router side, then emit more user friendly error message.

@daipom daipom added the enhancement Feature request or improve operations label Jul 31, 2024
@daipom daipom changed the title Logs dont correctly show set @id Error event logs should include the plugin Jul 31, 2024
@daipom daipom changed the title Error event logs should include the plugin Error event logs should include the plugin id Jul 31, 2024
@daipom daipom changed the title Error event logs should include the plugin id Error event logs should include plugin id Jul 31, 2024
@Athishpranav2003
Copy link
Contributor

@daipom so basically we need to avoid change in function signature but capture the caller plugin id?

@daipom
Copy link
Contributor

daipom commented Jul 31, 2024

so basically we need to avoid change in function signature but capture the caller plugin id?

Yes.
We can't change the signature of emit_error_event() because it is related to Event Emitter plugin helper API.
We need to keep compatibility for the plugins that use this function.

@Athishpranav2003
Copy link
Contributor

Athishpranav2003 commented Jul 31, 2024

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

@DavidKnacker
Copy link
Author

@Athishpranav2003 to come back to your question, I do see the @id field for logs generated by the input plugin.

@Athishpranav2003
Copy link
Contributor

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

@daipom what about this?

@DavidKnacker
Copy link
Author

Are there any updates on this?

@daipom
Copy link
Contributor

daipom commented Sep 11, 2024

Sorry for my late response.

Why don't we use log.error method to generate the error log than emitting it as event(It's indirect but works the same way right)?

Filter plugin should use emit_error_event().
It outputs events as @ERROR label.
It is intended to rescue collected data that could not be processed.

On the other hand, the events from Fluentd logger mean logs of Fluentd, not collected data by input plugins.
Fluentd logger outputs events as @FLUENT_LOG label.

Using both would be a possible solution.
However, it would be better if it could be resolved through a common mechanism.

daipom added a commit to daipom/fluentd that referenced this issue Sep 11, 2024
It would be helpful if we could know what plugin emitted the
error event.

fluent#4567

We need to care about the compatibility.
This signature change would not break compatibility.

However, I'm concerned that `caller_plugin_id` has a race
condition, although I don't confirm it.
It looks to me that the id can be another plugin-id running
concurrently...
It is not the issue with this fix, it is the issue of the
existing implementation.

Signed-off-by: Daijiro Fukuda <[email protected]>
@daipom
Copy link
Contributor

daipom commented Sep 11, 2024

I made a PR for this issue.

However, I'm concerned about some race conditions of the related existing logic...

@DavidKnacker
Copy link
Author

I cannot say anything about potential race conditions, but I just tested the PR in my cluster and it seems to exactly achieve what I hoped for.
Thanks @daipom! :)
Would be great to have this in.

@daipom
Copy link
Contributor

daipom commented Sep 18, 2024

@DavidKnacker Thanks! I will check the race condition!

@DavidKnacker
Copy link
Author

Hey @daipom, are there any updates on the ticket?

@daipom
Copy link
Contributor

daipom commented Nov 13, 2024

Sorry, we need to consider the race condition issue more.
I wish we could get this into 1.18.0, but I am not sure...

daipom added a commit to daipom/fluentd that referenced this issue Nov 27, 2024
It would be helpful if we could know what plugin emitted the
error event.

fluent#4567

We need to care about the compatibility.
This signature change would not break compatibility.

However, I'm concerned that `caller_plugin_id` has a race
condition, although I don't confirm it.
It looks to me that the id can be another plugin-id running
concurrently...
It is not the issue with this fix, it is the issue of the
existing implementation.

Signed-off-by: Daijiro Fukuda <[email protected]>
@daipom
Copy link
Contributor

daipom commented Nov 27, 2024

I confirmed the race condition issue.

We need to fix this race condition first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature request or improve operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants