Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337

c-h-afzal · 2024-11-26T05:06:57Z

What feature you would like to be added?

Ability for Spark Operator to adopt a provider-pattern/pluggable-mechanism to launch spark applications. A user can specify an option other than spark-submit to launch spark applications.

Why is this needed?

In the current implementation Spark Operator invokes spark-submit to launch a spark application in the cluster. From our testing we have determined that the penalty for the JVM spin-up causes significant increase in job latency when the cluster is under stress/heavy-load, i.e. the rate of spark applications being enqueued is higher than the rate of applications being dequeued causing the spark operator’s internal queue to swell-up and affect job latencies. We want to be able to launch spark application using native Go, without the JVM spin-up as part of spark-submit.

Describe the solution you would like

The solution we are proposing (and willing to contribute to, if consensus can be reached) is for the Spark Operator to allow changing the only mechanism (spark-submit) of launching spark applications to a user specified one. The default mechanism remains to be spark-submit. Users can specify their own plugin to launch spark applications a different way. Specifically, (here at Salesforce Big Data Org), in our fork, we create driver pods using Go and skip the JVM penalty. The work-around was devised by @ gangahiremath and mentioned in the issue#1574

Our work-around ports the functionality of spark-submit to Golang and significantly reduces the time it takes for a SparkApplication CRD object to be CREATED and then transition to the SUBMITTED state. If there’s enough interest in our approach, we plan to open-source Ganga's work-around too.

Describe alternatives you have considered

For improving latencies we have considered the pending PR which claims of performance boost by have a single queue per app. However, we have not realized the claimed performance enhancements in our testing. We still find JVM spin-up times to be the bottle-neck and hence the proposal.

Additional context

No response

Love this feature?

Give it a 👍 We prioritize the features with most 👍

yuchaoran2011 · 2024-11-26T06:39:54Z

Thanks for the thoughtful proposal. If the Go native way of creating Spark driver is inherently faster than Spark-submit, I would argue that it makes sense to use that as the default. Do you see any limitations compared to spark-submit?

c-h-afzal · 2024-11-26T07:43:47Z

@yuchaoran2011 - Thanks for your response. So there are a couple of reasons:

We'll have to maintain functional parity of launching spark applications with spark-submit. Any changes/additions in spark-submit would need to be replicated in the native Go alternate.
Depending on workload characteristics, some users may not experience significant performance gains. Though from our testing we have always found the Go workaround to outperform spark-submit invocations but the author of the PR#1990 mentioned that they didn't see performance enhancements when using our PR. So we don't want to propose our workaround to be default until we have enough support/anecdotal evidence from the community to make the switch.

Please let me know if you have anymore questions. Thank you.

yuchaoran2011 · 2024-11-27T05:45:30Z

All valid points. Maintaining feature parity with spark-submit will be a long-term effort that the community is willing to shoulder. Before contributing the code changes, I think the community can decide on a direction more easily if you can provide a document detailing the overall design (i.e. how the pluggable driver creation mechanism works) and the benchmarking you did comparing your implementation with the current code base. Tagging the rest of the maintainers for their thoughts as well @ChenYi015 @vara-bonthu @jacobsalway

bnetzi · 2024-11-27T13:49:54Z

I think this is a great feature, but I still think it should be opt-in.
We actually tried in our env to use the proposed code here - master...gangahiremath:spark-on-k8s-operator:master

And it does work, but there were some minor differences which I don't recall currently (I believe it was something related to volume mounts on pod templates) but we had to implement.

The point is - spark-submit behavior can be very different per configuration, to make a replacement that would be equivalent for all usage types would require an on-going effort.

Also - for now spark operator can support multiple spark versions without any issues. With this approach it would mean if spark submit would change in the future, spark operator would be need to either need to know which version of spark it runs and change its behavior accordingly, or to be coupled with spark version.

c-h-afzal · 2024-12-03T01:54:28Z

@yuchaoran2011 - We'll work on a design and share it with the community. Thanks.

c-h-afzal · 2024-12-10T03:05:57Z

Hey Guys - we have the design doc ready. Please feel free to add any comments/feedback. Also, added a PR link the doc that demonstrates what the changes would eventually look-like. Thanks.

fyi @yuchaoran2011

yuchaoran2011 · 2024-12-13T23:57:10Z

@c-h-afzal Requested access. Could you open up read-only access for everyone so that the community can review and comment?

c-h-afzal · 2024-12-14T00:22:37Z

@yuchaoran2011 Ah, I'd love to but given the doc. is on Salesforce account, the company policy restricts public access. Let me check within if there's a way for me to make this doc public. In the meanwhile, I have given you access.

c-h-afzal added the kind/feature label Nov 26, 2024

c-h-afzal mentioned this issue Nov 26, 2024

Draft: Performance mega boost - queue per app #1990

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337

Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337

c-h-afzal commented Nov 26, 2024

yuchaoran2011 commented Nov 26, 2024

c-h-afzal commented Nov 26, 2024

yuchaoran2011 commented Nov 27, 2024

bnetzi commented Nov 27, 2024

c-h-afzal commented Dec 3, 2024

c-h-afzal commented Dec 10, 2024 •

edited

Loading

yuchaoran2011 commented Dec 13, 2024

c-h-afzal commented Dec 14, 2024

Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337

Enable the Spark Operator to launch applications using user-defined mechanisms beyond the default spark-submit #2337

Comments

c-h-afzal commented Nov 26, 2024

What feature you would like to be added?

Why is this needed?

Describe the solution you would like

Describe alternatives you have considered

Additional context

Love this feature?

yuchaoran2011 commented Nov 26, 2024

c-h-afzal commented Nov 26, 2024

yuchaoran2011 commented Nov 27, 2024

bnetzi commented Nov 27, 2024

c-h-afzal commented Dec 3, 2024

c-h-afzal commented Dec 10, 2024 • edited Loading

yuchaoran2011 commented Dec 13, 2024

c-h-afzal commented Dec 14, 2024

c-h-afzal commented Dec 10, 2024 •

edited

Loading