Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spark 3.5.3, .NET 8, Dependencies #1178

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -372,4 +372,8 @@ hs_err_pid*
.ionide/

# Mac dev
.DS_Store
.DS_Store

# Scala intermideate build files
**/.bloop/
**/.metals/
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

.NET for Apache Spark is compliant with .NET Standard - a formal specification of .NET APIs that are common across .NET implementations. This means you can use .NET for Apache Spark anywhere you write .NET code allowing you to reuse all the knowledge, skills, code, and libraries you already have as a .NET developer.

.NET for Apache Spark runs on Windows, Linux, and macOS using .NET 6, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks) & [Azure](deployment/README.md#databricks) Databricks.
.NET for Apache Spark runs on Windows, Linux, and macOS using .NET 8, or Windows using .NET Framework. It also runs on all major cloud providers including [Azure HDInsight Spark](deployment/README.md#azure-hdinsight-spark), [Amazon EMR Spark](deployment/README.md#amazon-emr-spark), [AWS](deployment/README.md#databricks) & [Azure](deployment/README.md#databricks) Databricks.

**Note**: We currently have a Spark Project Improvement Proposal JIRA at [SPIP: .NET bindings for Apache Spark](https://issues.apache.org/jira/browse/SPARK-27006) to work with the community towards getting .NET support by default into Apache Spark. We highly encourage you to participate in the discussion.

Expand Down Expand Up @@ -40,7 +40,7 @@
<tbody align="center">
<tr>
<td>2.4*</td>
<td rowspan=4><a href="https://github.com/dotnet/spark/releases/tag/v2.1.1">v2.1.1</a></td>
<td rowspan=5><a href="https://github.com/dotnet/spark/releases/tag/v2.1.1">v2.1.1</a></td>
</tr>
<tr>
<td>3.0</td>
Expand All @@ -50,6 +50,9 @@
</tr>
<tr>
<td>3.2</td>
</tr>
<tr>
<td>3.5</td>
</tr>
</tbody>
</table>
Expand All @@ -61,7 +64,7 @@
.NET for Apache Spark releases are available [here](https://github.com/dotnet/spark/releases) and NuGet packages are available [here](https://www.nuget.org/packages/Microsoft.Spark).

## Get Started
These instructions will show you how to run a .NET for Apache Spark app using .NET 6.
These instructions will show you how to run a .NET for Apache Spark app using .NET 8.
- [Windows Instructions](docs/getting-started/windows-instructions.md)
- [Ubuntu Instructions](docs/getting-started/ubuntu-instructions.md)
- [MacOs Instructions](docs/getting-started/macos-instructions.md)
Expand All @@ -79,8 +82,8 @@ Building from source is very easy and the whole process (from cloning to being a

| | | Instructions |
| :---: | :--- | :--- |
| ![Windows icon](docs/img/windows-icon-32.png) | **Windows** | <ul><li>Local - [.NET Framework 4.6.1](docs/building/windows-instructions.md#using-visual-studio-for-net-framework-461)</li><li>Local - [.NET 6](docs/building/windows-instructions.md#using-net-core-cli-for-net-core)</li><ul> |
| ![Ubuntu icon](docs/img/ubuntu-icon-32.png) | **Ubuntu** | <ul><li>Local - [.NET 6](docs/building/ubuntu-instructions.md)</li><li>[Azure HDInsight Spark - .NET 6](deployment/README.md)</li></ul> |
| ![Windows icon](docs/img/windows-icon-32.png) | **Windows** | <ul><li>Local - [.NET Framework 4.8](docs/building/windows-instructions.md#using-visual-studio-for-net-framework)</li><li>Local - [.NET 8](docs/building/windows-instructions.md#using-net-core-cli-for-net-core)</li><ul> |
| ![Ubuntu icon](docs/img/ubuntu-icon-32.png) | **Ubuntu** | <ul><li>Local - [.NET 8](docs/building/ubuntu-instructions.md)</li><li>[Azure HDInsight Spark - .NET 8](deployment/README.md)</li></ul> |

<a name="samples"></a>
## Samples
Expand Down
6 changes: 3 additions & 3 deletions azure-pipelines-e2e-tests-template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,10 @@ stages:
mvn -version

- task: UseDotNet@2
displayName: 'Use .NET 6 sdk'
displayName: 'Use .NET 8 sdk'
inputs:
packageType: sdk
version: 6.x
version: 8.x
installationPath: $(Agent.ToolsDirectory)/dotnet

- task: DownloadPipelineArtifact@2
Expand All @@ -78,7 +78,7 @@ stages:
artifactName: Microsoft.Spark.Binaries

- pwsh: |
$framework = "net6.0"
$framework = "net8.0"

if ($env:AGENT_OS -eq 'Windows_NT') {
$runtimeIdentifier = "win-x64"
Expand Down
11 changes: 10 additions & 1 deletion azure-pipelines-pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,17 @@ variables:
backwardCompatibleTestOptions_Linux_3_1: ""
forwardCompatibleTestOptions_Linux_3_1: ""

# Skip all forward/backward compatibility tests since Spark 3.2 is not supported before this release.
# Skip all forward/backward compatibility tests since Spark 3.2 and 3.5 are not supported before this release.
backwardCompatibleTestOptions_Windows_3_2: "--filter FullyQualifiedName=NONE"
forwardCompatibleTestOptions_Windows_3_2: $(backwardCompatibleTestOptions_Windows_3_2)
backwardCompatibleTestOptions_Linux_3_2: $(backwardCompatibleTestOptions_Windows_3_2)
forwardCompatibleTestOptions_Linux_3_2: $(backwardCompatibleTestOptions_Windows_3_2)

backwardCompatibleTestOptions_Windows_3_5: "--filter FullyQualifiedName=NONE"
forwardCompatibleTestOptions_Windows_3_5: $(backwardCompatibleTestOptions_Windows_3_5)
backwardCompatibleTestOptions_Linux_3_5: $(backwardCompatibleTestOptions_Windows_3_5)
forwardCompatibleTestOptions_Linux_3_5: $(backwardCompatibleTestOptions_Windows_3_5)

# Azure DevOps variables are transformed into environment variables, with these variables we
# avoid the first time experience and telemetry to speed up the build.
DOTNET_CLI_TELEMETRY_OPTOUT: 1
Expand All @@ -63,6 +68,10 @@ parameters:
- '3.2.1'
- '3.2.2'
- '3.2.3'
- '3.5.0'
- '3.5.1'
- '3.5.2'
- '3.5.3'
# List of OS types to run E2E tests, run each test in both 'Windows' and 'Linux' environments
- name: listOfE2ETestsPoolTypes
type: object
Expand Down
55 changes: 54 additions & 1 deletion azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,17 @@ variables:
backwardCompatibleTestOptions_Linux_3_1: ""
forwardCompatibleTestOptions_Linux_3_1: ""

# Skip all forward/backward compatibility tests since Spark 3.2 is not supported before this release.
# Skip all forward/backward compatibility tests since Spark 3.2 and 3.5 are not supported before this release.
backwardCompatibleTestOptions_Windows_3_2: "--filter FullyQualifiedName=NONE"
forwardCompatibleTestOptions_Windows_3_2: $(backwardCompatibleTestOptions_Windows_3_2)
backwardCompatibleTestOptions_Linux_3_2: $(backwardCompatibleTestOptions_Windows_3_2)
forwardCompatibleTestOptions_Linux_3_2: $(backwardCompatibleTestOptions_Windows_3_2)

backwardCompatibleTestOptions_Windows_3_5: "--filter FullyQualifiedName=NONE"
forwardCompatibleTestOptions_Windows_3_5: $(backwardCompatibleTestOptions_Windows_3_5)
backwardCompatibleTestOptions_Linux_3_5: $(backwardCompatibleTestOptions_Windows_3_5)
forwardCompatibleTestOptions_Linux_3_5: $(backwardCompatibleTestOptions_Windows_3_5)

# Azure DevOps variables are transformed into environment variables, with these variables we
# avoid the first time experience and telemetry to speed up the build.
DOTNET_CLI_TELEMETRY_OPTOUT: 1
Expand Down Expand Up @@ -413,3 +418,51 @@ stages:
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Linux_3_2)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Linux_3_2)
- version: '3.5.0'
enableForwardCompatibleTests: false
enableBackwardCompatibleTests: false
jobOptions:
- pool: 'Windows'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Windows_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Windows_3_5)
- pool: 'Linux'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Linux_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Linux_3_5)
- version: '3.5.1'
enableForwardCompatibleTests: false
enableBackwardCompatibleTests: false
jobOptions:
- pool: 'Windows'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Windows_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Windows_3_5)
- pool: 'Linux'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Linux_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Linux_3_5)
- version: '3.5.2'
enableForwardCompatibleTests: false
enableBackwardCompatibleTests: false
jobOptions:
- pool: 'Windows'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Windows_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Windows_3_5)
- pool: 'Linux'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Linux_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Linux_3_5)
- version: '3.5.3'
enableForwardCompatibleTests: false
enableBackwardCompatibleTests: false
jobOptions:
- pool: 'Windows'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Windows_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Windows_3_5)
- pool: 'Linux'
testOptions: ""
backwardCompatibleTestOptions: $(backwardCompatibleTestOptions_Linux_3_5)
forwardCompatibleTestOptions: $(forwardCompatibleTestOptions_Linux_3_5)
2 changes: 1 addition & 1 deletion benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ TPCH timing results is written to stdout in the following form: `TPCH_Result,<la
<true for sql tests, false for functional tests>
```
**Note**: Ensure that you build the worker and application with .NET 6 in order to run hardware acceleration queries.
**Note**: Ensure that you build the worker and application with .NET 8 in order to run hardware acceleration queries.
## Python
1. Upload [run_python_benchmark.sh](run_python_benchmark.sh) and all [python tpch benchmark](python/) files to the cluster.
Expand Down
6 changes: 3 additions & 3 deletions benchmark/csharp/Tpch/Tpch.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFrameworks>net461;net6.0</TargetFrameworks>
<TargetFrameworks Condition="'$(OS)' != 'Windows_NT'">net6.0</TargetFrameworks>
<TargetFrameworks>net48;net8.0</TargetFrameworks>
<TargetFrameworks Condition="'$(OS)' != 'Windows_NT'">net8.0</TargetFrameworks>
<RootNamespace>Tpch</RootNamespace>
<AssemblyName>Tpch</AssemblyName>
</PropertyGroup>
Expand All @@ -16,7 +16,7 @@
</ItemGroup>

<Choose>
<When Condition="'$(TargetFramework)' == 'net6.0'">
<When Condition="'$(TargetFramework)' == 'net8.0'">
<PropertyGroup>
<AllowUnsafeBlocks>true</AllowUnsafeBlocks>
</PropertyGroup>
Expand Down
6 changes: 3 additions & 3 deletions deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Microsoft.Spark.Worker is a backend component that lives on the individual worke
## Azure HDInsight Spark
[Azure HDInsight Spark](https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-overview) is the Microsoft implementation of Apache Spark in the cloud that allows users to launch and configure Spark clusters in Azure. You can use HDInsight Spark clusters to process your data stored in Azure (e.g., [Azure Storage](https://azure.microsoft.com/en-us/services/storage/) and [Azure Data Lake Storage](https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)).

> **Note:** Azure HDInsight Spark is Linux-based. Therefore, if you are interested in deploying your app to Azure HDInsight Spark, make sure your app is .NET Standard compatible and that you use [.NET 6 compiler](https://dotnet.microsoft.com/download) to compile your app.
> **Note:** Azure HDInsight Spark is Linux-based. Therefore, if you are interested in deploying your app to Azure HDInsight Spark, make sure your app is .NET Standard compatible and that you use [.NET 8 compiler](https://dotnet.microsoft.com/download) to compile your app.
### Deploy Microsoft.Spark.Worker
*Note that this step is required only once*
Expand Down Expand Up @@ -115,7 +115,7 @@ EOF
## Amazon EMR Spark
[Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html) is a managed cluster platform that simplifies running big data frameworks on AWS.
> **Note:** AWS EMR Spark is Linux-based. Therefore, if you are interested in deploying your app to AWS EMR Spark, make sure your app is .NET Standard compatible and that you use [.NET 6 compiler](https://dotnet.microsoft.com/download) to compile your app.
> **Note:** AWS EMR Spark is Linux-based. Therefore, if you are interested in deploying your app to AWS EMR Spark, make sure your app is .NET Standard compatible and that you use [.NET 8 compiler](https://dotnet.microsoft.com/download) to compile your app.
### Deploy Microsoft.Spark.Worker
*Note that this step is only required at cluster creation*
Expand Down Expand Up @@ -160,7 +160,7 @@ foo@bar:~$ aws emr add-steps \
## Databricks
[Databricks](http://databricks.com) is a platform that provides cloud-based big data processing using Apache Spark.
> **Note:** [Azure](https://azure.microsoft.com/en-us/services/databricks/) and [AWS](https://databricks.com/aws) Databricks is Linux-based. Therefore, if you are interested in deploying your app to Databricks, make sure your app is .NET Standard compatible and that you use [.NET 6 compiler](https://dotnet.microsoft.com/download) to compile your app.
> **Note:** [Azure](https://azure.microsoft.com/en-us/services/databricks/) and [AWS](https://databricks.com/aws) Databricks is Linux-based. Therefore, if you are interested in deploying your app to Databricks, make sure your app is .NET Standard compatible and that you use [.NET 8 compiler](https://dotnet.microsoft.com/download) to compile your app.
Databricks allows you to submit Spark .NET apps to an existing active cluster or create a new cluster everytime you launch a job. This requires the **Microsoft.Spark.Worker** to be installed **first** before you submit a Spark .NET app.
Expand Down
Loading