-
I'm getting started with an attempt to convert some of our data warehousing to .Net for Spark. It has been a fairly easy migration so far (from Scala). As of now I've only been working on it for a couple days. But I am much more productive in .Net. It takes me only a few hours to do stuff that used to take days in Scala. And not only do we get to use .Net - but I get to use VS too (instead of IntelliJ/modules and databricks/notebooks). Thanks to all of you who worked on this! My question is related to the underlying architecture. I was wondering about the worker processes that are spawned. Why do these processes (Microsoft.Spark.Worker) keep stopping and restarting? Can't the Spark environment keep things running along side the worker nodes without killing the .Net processes all the time? It seems like there would be some OS overhead to continually restart these processes and reload assemblies. It doesn't seem optimal. Does the python architecture do a similar thing? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Are you running this on Windows? If so, worker needs to be respawned for each task: https://github.com/apache/spark/blob/2fa792aa64f6153af1641d895e2f996b18dfbce4/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L48 On non-Windows, one worker is launched per executor and shared across tasks. Note that we are piggybacking on how Python worker is invoked, there is not much we can do to improve this process unless we modify the OSS Spark. |
Beta Was this translation helpful? Give feedback.
Are you running this on Windows? If so, worker needs to be respawned for each task: https://github.com/apache/spark/blob/2fa792aa64f6153af1641d895e2f996b18dfbce4/core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala#L48
On non-Windows, one worker is launched per executor and shared across tasks.
Note that we are piggybacking on how Python worker is invoked, there is not much we can do to improve this process unless we modify the OSS Spark.