Spark backend does not work on Databricks #4642

HaraldVanWoerkom · 2022-05-20T06:17:13Z

When I create an estimator with a 'spark' backend on Databricks, I have to provide a path to a shared folder. On Databricks that is on /dbfs/...:
est = Estimator.from_keras(model_creator=my_model_creator, backend='spark', model_dir = '/dbfs/tmp/bigdl')

Unfortunately, the fit crashes with an "operation not supported" exception. This is (apparently) because the model writing code uses random access, which is not supported by the file system.
BigDL has a solution for this, it can write to local storage and then copy to the shared folder. This is triggered by setting model_dir to 'dbfs:/tmp/bigdl'. Unfortunately, this causes an exception in bigdl/orca/learn/utils.py, the save_pkl function, because 'open' does not support the dbfs: format (yes, in Databricks some APIs require dbfs:, while others do not support it).

I patched utils.py to support dbfs (line 380):
else:
if path.startswith("file://"):
path = path[len("file://"):]
elif path.startswith("dbfs:/"): # NEW
path = "/dbfs/" + path[len("dbfs:/"):] # NEW
with open(path, 'wb') as f:
pickle.dump(data, f)

This seems to work.

jason-dai · 2022-05-20T10:44:45Z

@HaraldVanWoerkom Thanks for reporting the issue; we'll take a look.

Le-Zheng · 2022-05-25T01:48:08Z

@HaraldVanWoerkom many thanks for your posting! I have verified the solution path = "/dbfs/" + path[len("dbfs:/"):] works on the Databricks. I will create a fix in the source code.

HaraldVanWoerkom · 2022-05-25T05:54:01Z

@Le-Zheng Thanks for picking this up.

Le-Zheng · 2022-05-25T19:17:13Z

@Le-Zheng Thanks for picking this up.

sure. Estimator.from_keras API does not call save_pkl function.

Le-Zheng · 2022-05-25T19:17:25Z

This issue will be fixed in the patch #4674

helenlly added the user issue label May 24, 2022

helenlly assigned Le-Zheng May 24, 2022

PatrickkZ mentioned this issue Sep 8, 2022

fix issue 4642, fix DBFS file path problem on Dataricks #5679

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spark backend does not work on Databricks #4642

Spark backend does not work on Databricks #4642

HaraldVanWoerkom commented May 20, 2022

jason-dai commented May 20, 2022

Le-Zheng commented May 25, 2022

HaraldVanWoerkom commented May 25, 2022

Le-Zheng commented May 25, 2022

Le-Zheng commented May 25, 2022

Spark backend does not work on Databricks #4642

Spark backend does not work on Databricks #4642

Comments

HaraldVanWoerkom commented May 20, 2022

jason-dai commented May 20, 2022

Le-Zheng commented May 25, 2022

HaraldVanWoerkom commented May 25, 2022

Le-Zheng commented May 25, 2022

Le-Zheng commented May 25, 2022