Replies: 1 comment
-
I am finding a few clues to explain the 40 second window when "nothing" seems to be happening. In the databricks U/I there is a "log4j" windows that shows logging from the driver. It had very limited detail, and the information that was available didn't span the entire 40 second window. So I assumed it would be difficult to investigate that interval... However I discovered a link to a file, "log4j-active.log", and it turns out to have different contents than what is shown on the U/I (confusingly enough). It has quite a bit more detail to explain the 40 second window of time when "nothing" is happening... In general, there is lots of setup work that is happening during this interval. I thought this would have been done previously, in the "instance pool", but apparently that was not the case. This particular setup work must depend on the specific configuration of my job cluster. Much of the setup work is proportional to the number of Java assemblies that are loaded. Some of the setup is related to the .net stuff (eg. unzipping the driver code) Most of this setup work seems reasonable... But I noticed that a few seconds are related to preparing to use R programming (RDriverLocal). That seems like something that should be done as an opt-in. Can someone tell me how to disable it? I haven't been able to google this very well, probably because the letter R is ignored in my google searches. |
Beta Was this translation helpful? Give feedback.
-
I'm using azure databricks to host a basic .net for spark application. I use their "cluster pool" feature to try to improve startup performance. It uses VM's that are already warmed-up and ready for use.
Executing Spark applications is never snappy by any means. But databricks claims their platform runs more efficiently than most.
I can't understand what is happening in the forty seconds of time between when my executors are added, and when the first job is started.
I will try to dig into this more deeply today but was wondering if anyone already knows off the top of their heads. Is this specific to our configuration, or is there some design principal that would explain why databricks should take 40 seconds to start running my code in a job cluster (... one that is created from a pool, and has executors).
My on-prem stand-alone cluster doesn't behave this way. The main difference is that azure-databricks is running in Azure, is running 7.3 LTS (rather than apache spark 3.0.0), and its VM's are all hosted on a VNET.
Please let me know if anyone has ideas. Otherwise I will dig, and open a support ticket.
Thanks for letting me vent. Here is the code that is submitted in this example.
Thanks, David
Beta Was this translation helpful? Give feedback.
All reactions