Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337

Open
c-h-afzal opened this issue Nov 26, 2024 · 8 comments

Comments

@c-h-afzal
Copy link
Contributor

What feature you would like to be added?

Ability for Spark Operator to adopt a provider-pattern/pluggable-mechanism to launch spark applications. A user can specify an option other than spark-submit to launch spark applications.

Why is this needed?

In the current implementation Spark Operator invokes spark-submit to launch a spark application in the cluster. From our testing we have determined that the penalty for the JVM spin-up causes significant increase in job latency when the cluster is under stress/heavy-load, i.e. the rate of spark applications being enqueued is higher than the rate of applications being dequeued causing the spark operator’s internal queue to swell-up and affect job latencies. We want to be able to launch spark application using native Go, without the JVM spin-up as part of spark-submit.

Describe the solution you would like

The solution we are proposing (and willing to contribute to, if consensus can be reached) is for the Spark Operator to allow changing the only mechanism (spark-submit) of launching spark applications to a user specified one. The default mechanism remains to be spark-submit. Users can specify their own plugin to launch spark applications a different way. Specifically, (here at Salesforce Big Data Org), in our fork, we create driver pods using Go and skip the JVM penalty. The work-around was devised by @ gangahiremath and mentioned in the issue#1574

Our work-around ports the functionality of spark-submit to Golang and significantly reduces the time it takes for a SparkApplication CRD object to be CREATED and then transition to the SUBMITTED state. If there’s enough interest in our approach, we plan to open-source Ganga's work-around too.

Describe alternatives you have considered

For improving latencies we have considered the pending PR which claims of performance boost by have a single queue per app. However, we have not realized the claimed performance enhancements in our testing. We still find JVM spin-up times to be the bottle-neck and hence the proposal.

Additional context

No response

Love this feature?

Give it a 👍 We prioritize the features with most 👍

@yuchaoran2011
Copy link
Contributor

Thanks for the thoughtful proposal. If the Go native way of creating Spark driver is inherently faster than Spark-submit, I would argue that it makes sense to use that as the default. Do you see any limitations compared to spark-submit?

@c-h-afzal
Copy link
Contributor Author

@yuchaoran2011 - Thanks for your response. So there are a couple of reasons:

  1. We'll have to maintain functional parity of launching spark applications with spark-submit. Any changes/additions in spark-submit would need to be replicated in the native Go alternate.

  2. Depending on workload characteristics, some users may not experience significant performance gains. Though from our testing we have always found the Go workaround to outperform spark-submit invocations but the author of the PR#1990 mentioned that they didn't see performance enhancements when using our PR. So we don't want to propose our workaround to be default until we have enough support/anecdotal evidence from the community to make the switch.

Please let me know if you have anymore questions. Thank you.

@yuchaoran2011
Copy link
Contributor

All valid points. Maintaining feature parity with spark-submit will be a long-term effort that the community is willing to shoulder. Before contributing the code changes, I think the community can decide on a direction more easily if you can provide a document detailing the overall design (i.e. how the pluggable driver creation mechanism works) and the benchmarking you did comparing your implementation with the current code base. Tagging the rest of the maintainers for their thoughts as well @ChenYi015 @vara-bonthu @jacobsalway

@bnetzi
Copy link

bnetzi commented Nov 27, 2024

I think this is a great feature, but I still think it should be opt-in.
We actually tried in our env to use the proposed code here - master...gangahiremath:spark-on-k8s-operator:master

And it does work, but there were some minor differences which I don't recall currently (I believe it was something related to volume mounts on pod templates) but we had to implement.

The point is - spark-submit behavior can be very different per configuration, to make a replacement that would be equivalent for all usage types would require an on-going effort.

Also - for now spark operator can support multiple spark versions without any issues. With this approach it would mean if spark submit would change in the future, spark operator would be need to either need to know which version of spark it runs and change its behavior accordingly, or to be coupled with spark version.

@c-h-afzal
Copy link
Contributor Author

@yuchaoran2011 - We'll work on a design and share it with the community. Thanks.

@c-h-afzal
Copy link
Contributor Author

c-h-afzal commented Dec 10, 2024

Hey Guys - we have the design doc ready. Please feel free to add any comments/feedback. Also, added a PR link the doc that demonstrates what the changes would eventually look-like. Thanks.

fyi @yuchaoran2011

@yuchaoran2011
Copy link
Contributor

@c-h-afzal Requested access. Could you open up read-only access for everyone so that the community can review and comment?

@c-h-afzal
Copy link
Contributor Author

@yuchaoran2011 Ah, I'd love to but given the doc. is on Salesforce account, the company policy restricts public access. Let me check within if there's a way for me to make this doc public. In the meanwhile, I have given you access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants