Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide GCS Support in cache_folder #12792

Closed
divyanshud opened this issue Sep 23, 2022 · 0 comments · Fixed by #13141 or #13163
Closed

Provide GCS Support in cache_folder #12792

divyanshud opened this issue Sep 23, 2022 · 0 comments · Fixed by #13141 or #13163
Assignees

Comments

@divyanshud
Copy link

divyanshud commented Sep 23, 2022

The cache_folder config where models/pipelines are downloaded/extracted and loaded doesn't support GCS at the moment.

I am trying to build spark session and passing the cache_folder config as given below

spark = SparkSession.builder
.config("spark.dynamicAllocation.maxExecutors", "8")
.config("spark.executor.cores", "2")
.config("spark.executor.memory", "8g")
.config("spark.jars.packages","com.johnsnowlabs.nlp:spark-nlp_2.12:4.1.0")
.config("spark.jsl.settings.pretrained.cache_folder","gs://<bucket_name>/spark-nlp/cache_pretrained")
.getOrCreate()

Post downloading the model on usage I got the following error

Py4JJavaError: An error occurred while calling z:com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize.
: java.lang.ExceptionInInitializerError
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.$anonfun$getDownloadSize$1(ResourceDownloader.scala:724)
at scala.Option.getOrElse(Option.scala:189)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader$.getDownloadSize(ResourceDownloader.scala:724)
at com.johnsnowlabs.nlp.pretrained.PythonResourceDownloader.getDownloadSize(ResourceDownloader.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.IllegalArgumentException: Wrong FS: gs://<bucket_name>/spark-nlp/cache_pretrained, expected: file:///

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment