-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Filebeat] Add parse_aws_vpc_flow_log processor #33656
[Filebeat] Add parse_aws_vpc_flow_log processor #33656
Conversation
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both. Usage: processors: - parse_aws_vpc_flow_log: format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status Benchmark: goos: darwin goarch: arm64 (Apple M1 Max) pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow BenchmarkProcessorRun/v5-mode-original-10 2694968 2212 ns/op 2836 B/op 31 allocs/op BenchmarkProcessorRun/v5-mode-ecs_and_original-10 1812913 3318 ns/op 2972 B/op 36 allocs/op
This comment was marked as outdated.
This comment was marked as outdated.
d149e21
to
6bcd888
Compare
Pinging @elastic/security-external-integrations (Team:Security-External Integrations) |
x-pack/filebeat/processors/aws_vpcflow/parse_aws_vpc_flow_log_test.go
Outdated
Show resolved
Hide resolved
x-pack/filebeat/processors/aws_vpcflow/parse_aws_vpc_flow_log_test.go
Outdated
Show resolved
Hide resolved
x-pack/filebeat/processors/aws_vpcflow/testdata/custom-nat-gateway.golden.yml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
One thing I noticed, in existing implementations you can have a mix of of vpc log formats, as long as they have a different number of fields. Could we change this so it can handle multiple formats? The use case I see is someone modifies their config for recording vpcflow logs, which would result in different formats being in the same S3 bucket.
This comment was marked as outdated.
This comment was marked as outdated.
…-vpc-flow-log-processor
…-vpc-flow-log-processor
I think this was an accidental feature. IMO we wanted to give the users the flexibility of an arbitrary format, but didn't have a way to deliver it due to limitations in ingest pipelines. So instead we loaded the pipeline with a few prescribed formats to get close to the feature. I can implement this with the requirement that each format have a unique field count. The execution time cost increases by about Comparison between 3a6e4cd .. 70bde35
|
x-pack/filebeat/processors/aws_vpcflow/parse_aws_vpc_flow_log.go
Outdated
Show resolved
Hide resolved
This gives a speedup and reduces the cost of adding multiple format support. benchmark old ns/op new ns/op delta BenchmarkProcessorRun/original-mode-v5-message-10 2225 2136 -4.00% BenchmarkProcessorRun/ecs-mode-v5-message-10 2875 2817 -2.02% BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10 3352 3233 -3.55%
…-vpc-flow-log-processor
c123a55
to
024b82b
Compare
Event type is a list. It will always contain "connection". Add "allowed" or "denied" will be added based on the vpc flow action of "ACCEPT" or "REJECT".
@Mergifyio backport 8.6 |
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both. Usage: ```yaml processors: - parse_aws_vpc_flow_log: format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status - community_id: ~ ``` Benchmark: ``` goos: darwin goarch: arm64 pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow BenchmarkProcessorRun/original-mode-v5-message-10 2810948 2138 ns/op 2836 B/op 31 allocs/op BenchmarkProcessorRun/ecs-mode-v5-message-10 1914754 3107 ns/op 1908 B/op 41 allocs/op BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10 1693279 3538 ns/op 3076 B/op 41 allocs/op ``` Co-authored-by: Dan Kortschak <[email protected]> (cherry picked from commit 1a86e42)
✅ Backports have been created
|
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both. Usage: ```yaml processors: - parse_aws_vpc_flow_log: format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status - community_id: ~ ``` Benchmark: ``` goos: darwin goarch: arm64 pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow BenchmarkProcessorRun/original-mode-v5-message-10 2810948 2138 ns/op 2836 B/op 31 allocs/op BenchmarkProcessorRun/ecs-mode-v5-message-10 1914754 3107 ns/op 1908 B/op 41 allocs/op BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10 1693279 3538 ns/op 3076 B/op 41 allocs/op ``` Co-authored-by: Dan Kortschak <[email protected]> (cherry picked from commit 1a86e42) Co-authored-by: Andrew Kroh <[email protected]>
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both. Usage: ```yaml processors: - parse_aws_vpc_flow_log: format: version account-id interface-id srcaddr dstaddr srcport dstport protocol packets bytes start end action log-status - community_id: ~ ``` Benchmark: ``` goos: darwin goarch: arm64 pkg: github.com/elastic/beats/v7/x-pack/filebeat/processors/aws_vpcflow BenchmarkProcessorRun/original-mode-v5-message-10 2810948 2138 ns/op 2836 B/op 31 allocs/op BenchmarkProcessorRun/ecs-mode-v5-message-10 1914754 3107 ns/op 1908 B/op 41 allocs/op BenchmarkProcessorRun/ecs_and_original-mode-v5-message-10 1693279 3538 ns/op 3076 B/op 41 allocs/op ``` Co-authored-by: Dan Kortschak <[email protected]>
What does this PR do?
This is a processor for parsing AWS VPC flow logs. It requires a user specified log format. It can populate the original flow log fields, ECS fields, or both.
Usage:
Benchmark:
Why is it important?
The normal volume of flow logs makes processing them a hot path. So provide a Beat processor to try to make processing as efficient as it can be and make it possible to use the Beat host's CPU to do processing.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Authors Notes
Need to add static fields likeThis can be added by other processors.event.kind
,event.type
,cloud.provider
. etc.ecs
mode in the processor is more aggressive in reducing duplication. The Fleet integration might want to useecs_and_original
then drop the fields needed to maintain its existing behavior.