Bulk helper: Support streaming use/inifinite streams or generators #2266

spinscale · 2024-05-31T07:36:23Z

🚀 Feature Proposal

A common use-case for apps are infinite consumers, that pass data over to Elasticsearch via bulk requests. Being a Java Client users for many years I thought all clients operate the same and do support this. However Martijn corrected my assumption in the forum that there is a differentiation between push and pull based bulk ingestion helpers in the various clients.

My basic idea would be adding support (or maybe it already works and just requires documentation updates) for endless ingestion by providing a bulk helper. This way I could use something like queueable to keeping adding data that then gets consumed by a bulk helper.

As mentioned in the thread, there may be corner cases (like the queue being empty longer than the flush interval), that need to be covered.

Also in order to align with the other clients, adding another document count threshold to the bulk helper could make sense.

Motivation

This will make it easier to implement any kind of continously polling/streaming service that needs to bulk index data into Elasticsearch.

Example

I'd assume there is no change in the bulk API actually (maybe also add number of documents), but it allows parsing a generator that is infinite.

P.S. If this already works as expected, please close - there is still the possibility I missed this in the docs and just asked around for nothing cause everything works as expected 😀

JoshMock · 2024-06-03T18:50:44Z

I've not used queueable before, but if it uses ReadableStream correctly, the bulk helper should already support it. As the docs note, datasource can be an array, async generator, or ReadableStream. (It also works with Buffers, and I'm not sure why that's not documented.) Here is where the code asserts what types it supports, and here is where it begins looping over datasource.

Have you already tried using an infinite stream or generator to see if they work? If not, I'd love to know what problems you ran into because it should!

github-actions · 2024-09-02T01:58:47Z

This issue is stale because it has been open 90 days with no activity. Remove the stale label, or leave a comment, or this will be closed in 14 days.

github-actions · 2024-12-03T02:08:31Z

This issue is stale because it has been open 90 days with no activity. Remove the stale label, or leave a comment, or this will be closed in 14 days.

spinscale added the Category: Feature label May 31, 2024

github-actions bot added the stale label Sep 2, 2024

JoshMock added the Area: Helpers label Sep 3, 2024

stale bot removed the stale label Sep 3, 2024

github-actions bot added the stale label Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bulk helper: Support streaming use/inifinite streams or generators #2266

Bulk helper: Support streaming use/inifinite streams or generators #2266

spinscale commented May 31, 2024

JoshMock commented Jun 3, 2024

github-actions bot commented Sep 2, 2024

github-actions bot commented Dec 3, 2024

Bulk helper: Support streaming use/inifinite streams or generators #2266

Bulk helper: Support streaming use/inifinite streams or generators #2266

Comments

spinscale commented May 31, 2024

🚀 Feature Proposal

Motivation

Example

JoshMock commented Jun 3, 2024

github-actions bot commented Sep 2, 2024

github-actions bot commented Dec 3, 2024