Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make pipeline creation idempotent #5083

Open
zyy17 opened this issue Dec 3, 2024 · 2 comments
Open

Make pipeline creation idempotent #5083

zyy17 opened this issue Dec 3, 2024 · 2 comments
Labels
C-enhancement Category Enhancements

Comments

@zyy17
Copy link
Collaborator

zyy17 commented Dec 3, 2024

What type of enhancement is this?

User experience

What does the enhancement do?

Current Implementation

The current pipeline creation is not idempotent, for example, when we create the same pipeline twice:

# 1
curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "[email protected]"

# 2 Create the same pipeline again.
curl -X "POST" "http://localhost:4000/v1/events/pipelines/nginx_pipeline" -F "[email protected]"

It will store the multiple pipelines in greptime_private.pipelines:

mysql> select name, schema, created_at from greptime_private.pipelines;
+----------------+--------+----------------------------+
| name           | schema | created_at                 |
+----------------+--------+----------------------------+
| nginx_pipeline | public | 2024-12-03 03:04:08.220261 |
| nginx_pipeline | public | 2024-12-03 03:04:26.441025 |
+----------------+--------+----------------------------+
2 rows in set (0.02 sec)

Expectation

In my opinion, the pipeline should be unique throughout its lifetime. We can use (name, schema) as the unique constraint. When we create the same pipeline that has already been exited, it should be a UPDATE operation, which means the semantic of API /v1/events/pipelines/${pipeline} should create_or_update pipeline. The idempotent creation will be easy to use and operate.

Implementation challenges

No response

@zyy17 zyy17 added the C-enhancement Category Enhancements label Dec 3, 2024
@sunng87
Copy link
Member

sunng87 commented Dec 3, 2024

Internally @paomian implemented a versioned system for pipeline. So it will always use the latest one for parsing data. There is also a version parameter by which you can specify the exact pipeline name and create time.

https://docs.greptime.com/user-guide/logs/write-logs#http-api

@zyy17
Copy link
Collaborator Author

zyy17 commented Dec 3, 2024

@sunng87 @paomian I think using the creation timestamp as the implicit version is very hard to use. The user has to query the exact time to use it.

We can consider the following approaches:

  • If the user doesn't specify the version field when creating, always store the latest version of the pipeline. It's very confusing to get multiple pipelines if the user queries from greptime_private.pipelines table. Actually, it's enough to store the latest one;

  • If the user specifies the version explicitly, follow the current logic;

Furthermore, maybe we should refactor the docs and add the version field in creation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements
Projects
None yet
Development

No branches or pull requests

2 participants