-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337
Comments
Thanks for the thoughtful proposal. If the Go native way of creating Spark driver is inherently faster than Spark-submit, I would argue that it makes sense to use that as the default. Do you see any limitations compared to spark-submit? |
@yuchaoran2011 - Thanks for your response. So there are a couple of reasons:
Please let me know if you have anymore questions. Thank you. |
All valid points. Maintaining feature parity with spark-submit will be a long-term effort that the community is willing to shoulder. Before contributing the code changes, I think the community can decide on a direction more easily if you can provide a document detailing the overall design (i.e. how the pluggable driver creation mechanism works) and the benchmarking you did comparing your implementation with the current code base. Tagging the rest of the maintainers for their thoughts as well @ChenYi015 @vara-bonthu @jacobsalway |
I think this is a great feature, but I still think it should be opt-in. And it does work, but there were some minor differences which I don't recall currently (I believe it was something related to volume mounts on pod templates) but we had to implement. The point is - spark-submit behavior can be very different per configuration, to make a replacement that would be equivalent for all usage types would require an on-going effort. Also - for now spark operator can support multiple spark versions without any issues. With this approach it would mean if spark submit would change in the future, spark operator would be need to either need to know which version of spark it runs and change its behavior accordingly, or to be coupled with spark version. |
@yuchaoran2011 - We'll work on a design and share it with the community. Thanks. |
Hey Guys - we have the design doc ready. Please feel free to add any comments/feedback. Also, added a PR link the doc that demonstrates what the changes would eventually look-like. Thanks. fyi @yuchaoran2011 |
@c-h-afzal Requested access. Could you open up read-only access for everyone so that the community can review and comment? |
@yuchaoran2011 Ah, I'd love to but given the doc. is on Salesforce account, the company policy restricts public access. Let me check within if there's a way for me to make this doc public. In the meanwhile, I have given you access. |
What feature you would like to be added?
Ability for Spark Operator to adopt a provider-pattern/pluggable-mechanism to launch spark applications. A user can specify an option other than spark-submit to launch spark applications.
Why is this needed?
In the current implementation Spark Operator invokes spark-submit to launch a spark application in the cluster. From our testing we have determined that the penalty for the JVM spin-up causes significant increase in job latency when the cluster is under stress/heavy-load, i.e. the rate of spark applications being enqueued is higher than the rate of applications being dequeued causing the spark operator’s internal queue to swell-up and affect job latencies. We want to be able to launch spark application using native Go, without the JVM spin-up as part of spark-submit.
Describe the solution you would like
The solution we are proposing (and willing to contribute to, if consensus can be reached) is for the Spark Operator to allow changing the only mechanism (spark-submit) of launching spark applications to a user specified one. The default mechanism remains to be spark-submit. Users can specify their own plugin to launch spark applications a different way. Specifically, (here at Salesforce Big Data Org), in our fork, we create driver pods using Go and skip the JVM penalty. The work-around was devised by @ gangahiremath and mentioned in the issue#1574
Our work-around ports the functionality of spark-submit to Golang and significantly reduces the time it takes for a SparkApplication CRD object to be CREATED and then transition to the SUBMITTED state. If there’s enough interest in our approach, we plan to open-source Ganga's work-around too.
Describe alternatives you have considered
For improving latencies we have considered the pending PR which claims of performance boost by have a single queue per app. However, we have not realized the claimed performance enhancements in our testing. We still find JVM spin-up times to be the bottle-neck and hence the proposal.
Additional context
No response
Love this feature?
Give it a 👍 We prioritize the features with most 👍
The text was updated successfully, but these errors were encountered: