You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I create an estimator with a 'spark' backend on Databricks, I have to provide a path to a shared folder. On Databricks that is on /dbfs/...:
est = Estimator.from_keras(model_creator=my_model_creator, backend='spark', model_dir = '/dbfs/tmp/bigdl')
Unfortunately, the fit crashes with an "operation not supported" exception. This is (apparently) because the model writing code uses random access, which is not supported by the file system.
BigDL has a solution for this, it can write to local storage and then copy to the shared folder. This is triggered by setting model_dir to 'dbfs:/tmp/bigdl'. Unfortunately, this causes an exception in bigdl/orca/learn/utils.py, the save_pkl function, because 'open' does not support the dbfs: format (yes, in Databricks some APIs require dbfs:, while others do not support it).
I patched utils.py to support dbfs (line 380):
else:
if path.startswith("file://"):
path = path[len("file://"):]
elif path.startswith("dbfs:/"): # NEW
path = "/dbfs/" + path[len("dbfs:/"):] # NEW
with open(path, 'wb') as f:
pickle.dump(data, f)
This seems to work.
The text was updated successfully, but these errors were encountered:
@HaraldVanWoerkom many thanks for your posting! I have verified the solution path = "/dbfs/" + path[len("dbfs:/"):] works on the Databricks. I will create a fix in the source code.
When I create an estimator with a 'spark' backend on Databricks, I have to provide a path to a shared folder. On Databricks that is on /dbfs/...:
est = Estimator.from_keras(model_creator=my_model_creator, backend='spark', model_dir = '/dbfs/tmp/bigdl')
Unfortunately, the fit crashes with an "operation not supported" exception. This is (apparently) because the model writing code uses random access, which is not supported by the file system.
BigDL has a solution for this, it can write to local storage and then copy to the shared folder. This is triggered by setting model_dir to 'dbfs:/tmp/bigdl'. Unfortunately, this causes an exception in bigdl/orca/learn/utils.py, the save_pkl function, because 'open' does not support the dbfs: format (yes, in Databricks some APIs require dbfs:, while others do not support it).
I patched utils.py to support dbfs (line 380):
else:
if path.startswith("file://"):
path = path[len("file://"):]
elif path.startswith("dbfs:/"): # NEW
path = "/dbfs/" + path[len("dbfs:/"):] # NEW
with open(path, 'wb') as f:
pickle.dump(data, f)
This seems to work.
The text was updated successfully, but these errors were encountered: