-
Notifications
You must be signed in to change notification settings - Fork 119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Batch Mapping & Sink #1688
Comments
Since I'm interested in this, I wanted to take a first attempt at implementing this following the patterns that currently exist. I'll expect to have some PRs up shortly for first few items I made on check list over this week. This will be my first contribution to this community, so will hopefully be able to work well with everyone to work through details. I made some tasks to help track how I'm plan to commit items. My initial focus is on python as it's my primary use case, though if I can I will try to do other languages as well. |
@magelisk - For sinks, the batch operation is already there in the interface exposed to the developers with streaming data from the gRPC clients, I don't expect there will be too much performance enhancement to send the data with a unary call from the client in a batch. |
I think we should do batching on top on gRPC streaming, we have an issue to track this #1564 |
Thanks for the input. I hadn't used the streaming sink so I wasn't familiar with it's interface taking the iterator, that's very useful. Following that and the recommendation that at this be built on top of the streaming interface I think I've wrapped my head around this. I see that #1564 was already given an assignment to @yhl25 . I don't know if any work has been done on this on his part (I couldn't find anything in his accessible repos), so I did an initial draft for my own personal learning opportunity. I don't wan to step on anyone, so feel free to kick it away, but hopefully I can contribute something to help. protodef: numaproj/numaflow-go#129 The biggest question of these - I did this with a NEW gRPC endpoint, so now the mapping streamer has to Fn handlers. This is a deviation from existing patterns but allows maintaining backwards combability. Not sure how the numaflow team hope to handle these kinds of breakages generally speaking. |
@magelisk, thank you for being a contributor to Numaflow. We (the Numaflow team) would like to know how the community and you are using the platform, learn about the use cases, and take some feedback. Would you be open for a chat with us based on your availability? |
@syayi would be happy to talk at some point. I'm generally pretty free, and US EST based |
We have a conversation tomorrow, but so that there's a written record I'll put this here now to solicit and document thoughts. The currently implementation I did was one that has new gRPC call within existing service. This works, though breaks the 'function-only' method as-is. So I think there are a few options
|
I haven't been able to pull this into my operation test bed yet, but local representative stuff looks good. Thanks. |
We have included this in the 1.3.0 release. |
Summary
I'mm looking for the ability for a mapper and sink to take a batch of items at once. Today these take single item at once, regardless of
readBatchSize
limiting the ability to take advantage of bulk APIs/processing,.Use Cases
I have three main use cases that interest me. All of these desire to operate at near real-time, and the messages aren't necessarily required to be ordered for histogram/statistic type metrics, Thus, horizontal scaling should be easy for my use cases
If there's a way to accomplish any of this today, I'm happy to adopt that, but I don't think the reduce capability quite fits my expectations, particularly with the time based windows and limitations of horizontal scaling in order to meet the ordering expectations.
Message from the maintainers:
If you wish to see this enhancement implemented please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
Tasks
The text was updated successfully, but these errors were encountered: