Open a new command prompt window, run the following:
spark-submit \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--master local \
<path-to-microsoft-spark-jar> \
debug
and you will see the followng output:
***********************************************************************
* .NET Backend running debug mode. Press enter to exit *
***********************************************************************
In this debug mode, DotnetRunner
does not launch the .NET application, but waits for it to connect. Leave this command prompt window open.
Now you can start your .NET application with a C# debugger (Visual Studio Debugger for Windows/macOS or C# Debugger Extension in Visual Code) to debug your application.
Note that this is currently supported only on Windows with Visual Studio Debugger.
Before running spark-submit
, set the following environment variable:
set DOTNET_WORKER_DEBUG=1
Now, when you run your Spark application, a Choose Just-In-Time Debugger
window will pop up. Choose a Visual Studio debugger.
The debugger will break at the following location in TaskRunner.cs:
if (EnvironmentUtils.GetEnvironmentVariableAsBool("DOTNET_WORKER_DEBUG"))
{
Debugger.Launch(); // <-- The debugger will break here.
}
Now, navigate to the .cs
file that contains the UDF that you plan to debug, and set a breakpoint. (The breakpoint will say The breakpoint will not currently be hit
because the worker hasn't loaded the assembly that contains UDF yet.)
Hit F5
to continue your application and the breakpoint will eventually be hit.
Note that the Choose Just-In-Time Debugger
window will pop-up for each task. Therefore, make sure to set the number of executors to a low number. For example, you can use --master local[1]
option for spark-submit
to set the number of tasks to 1, and hence launching a single debugger instance.
If you need to debug the Scala side code (DotnetRunner
, DotnetBackendHandler
, etc.), you can use the following command, and attach a debugger to the running process using IntelliJ:
spark-submit \
--driver-java-options -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 \
--class org.apache.spark.deploy.dotnet.DotnetRunner \
--master local \
<path-to-microsoft-spark-jar> \
<path-to-your-app-exe> <argument(s)-to-your-app>
We encourage developers to first read Apache Spark's Versioning Policy and Semantic Versioning to gain the most out of the instructions below.
At a high-level, Spark's versions are: [MAJOR].[FEATURE].[MAINTENANCE]. We will cover the upgrade path for each type of version separately below (in increasing order of effort required).
Since Apache Spark's [MAINTENANCE] releases involve only internal changes (e.g., bug fixes etc.), it is straightforward to upgrade the code base to support a [MAINTENANCE] release. The steps to do this are below:
- In the corresponding
pom.xml
, update thespark.version
value to the newly released version.- For example, if a new patch release is 2.4.3, you will update src/scala/microsoft-spark-2.4.x/pom.xml to have
<spark.version>2.4.3</spark.version>
.
- For example, if a new patch release is 2.4.3, you will update src/scala/microsoft-spark-2.4.x/pom.xml to have
- Update
DotnetRunner.supportedSparkVersions
to include the newly released version.- For example, if a new patch release is 2.4.3, you will update src/scala/microsoft-spark-2.4.x/src/main/scala/org/apache/spark/deploy/dotnet/DotnetRunner.scala.
- Update the azure-pipelines.yml to include E2E testing for the newly released version.
Refer to this commit for an example.
WIP
WIP