These instructions will show you how to run a .NET for Apache Spark app using .NET Core on Windows.
- Download and install the following: .NET Core 3.1 SDK | Visual Studio 2019 | Java 1.8 | Apache Spark 2.4.1
- Download and install Microsoft.Spark.Worker release:
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
c:\bin\Microsoft.Spark.Worker\
). - IMPORTANT Create a new environment variable
DOTNET_WORKER_DIR
and set it to the directory where you downloaded and extracted the Microsoft.Spark.Worker (e.g.,c:\bin\Microsoft.Spark.Worker
).
- Select a Microsoft.Spark.Worker release from .NET for Apache Spark GitHub Releases page and download into your local machine (e.g.,
For detailed instructions, you can see Building .NET for Apache Spark from Source on Windows.
- Open Visual Studio -> Create New Project -> Console App (.NET Core) -> Name:
HelloSpark
- Install
Microsoft.Spark
Nuget package into the solution from the spark nuget.org feed - see Ways to install Nuget Package - Write the following code into
Program.cs
:using Microsoft.Spark.Sql; namespace HelloSpark { class Program { static void Main(string[] args) { var spark = SparkSession.Builder().GetOrCreate(); var df = spark.Read().Json("people.json"); df.Show(); } } }
- Build the solution
- Open your terminal and navigate into your app folder:
cd <your-app-output-directory>
- Create
people.json
with the following content:{"name":"Michael"} {"name":"Andy", "age":30} {"name":"Justin", "age":19}
- Run your app
Note: This command assumes you have downloaded Apache Spark and added it to your PATH environment variable to be able to use
spark-submit ` --class org.apache.spark.deploy.dotnet.DotnetRunner ` --master local ` microsoft-spark-<version>.jar ` dotnet HelloSpark.dll
spark-submit
, otherwise, you would have to use the full path (e.g.,c:\bin\apache-spark\bin\spark-submit
). For detailed instructions, you can see Building .NET for Apache Spark from Source on Windows. - The output of the application should look similar to the output below:
+----+-------+ | age| name| +----+-------+ |null|Michael| | 30| Andy| | 19| Justin| +----+-------+