Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk helper: Support streaming use/inifinite streams or generators #2266

Open
spinscale opened this issue May 31, 2024 · 3 comments
Open

Bulk helper: Support streaming use/inifinite streams or generators #2266

spinscale opened this issue May 31, 2024 · 3 comments

Comments

@spinscale
Copy link

🚀 Feature Proposal

A common use-case for apps are infinite consumers, that pass data over to Elasticsearch via bulk requests. Being a Java Client users for many years I thought all clients operate the same and do support this. However Martijn corrected my assumption in the forum that there is a differentiation between push and pull based bulk ingestion helpers in the various clients.

My basic idea would be adding support (or maybe it already works and just requires documentation updates) for endless ingestion by providing a bulk helper. This way I could use something like queueable to keeping adding data that then gets consumed by a bulk helper.

As mentioned in the thread, there may be corner cases (like the queue being empty longer than the flush interval), that need to be covered.

Also in order to align with the other clients, adding another document count threshold to the bulk helper could make sense.

Motivation

This will make it easier to implement any kind of continously polling/streaming service that needs to bulk index data into Elasticsearch.

Example

I'd assume there is no change in the bulk API actually (maybe also add number of documents), but it allows parsing a generator that is infinite.

P.S. If this already works as expected, please close - there is still the possibility I missed this in the docs and just asked around for nothing cause everything works as expected 😀

@JoshMock
Copy link
Member

JoshMock commented Jun 3, 2024

I've not used queueable before, but if it uses ReadableStream correctly, the bulk helper should already support it. As the docs note, datasource can be an array, async generator, or ReadableStream. (It also works with Buffers, and I'm not sure why that's not documented.) Here is where the code asserts what types it supports, and here is where it begins looping over datasource.

Have you already tried using an infinite stream or generator to see if they work? If not, I'd love to know what problems you ran into because it should!

Copy link
Contributor

github-actions bot commented Sep 2, 2024

This issue is stale because it has been open 90 days with no activity. Remove the stale label, or leave a comment, or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Sep 2, 2024
@stale stale bot removed the stale label Sep 3, 2024
Copy link
Contributor

github-actions bot commented Dec 3, 2024

This issue is stale because it has been open 90 days with no activity. Remove the stale label, or leave a comment, or this will be closed in 14 days.

@github-actions github-actions bot added the stale label Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants