Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LightGBM: NullPointerException in version 0.16 #514

Closed
sully90 opened this issue Mar 13, 2019 · 16 comments
Closed

LightGBM: NullPointerException in version 0.16 #514

sully90 opened this issue Mar 13, 2019 · 16 comments

Comments

@sully90
Copy link

sully90 commented Mar 13, 2019

Hi all,

I'm getting the same issue as #405, but with the latest release. We're running spark on a centOS 7 cluster, and as such ran into the GLIBCXX_3.4.20 not found error, (see microsoft/LightGBM#1945), so we had to build the library manually with SWIG and set our spark driver path accordingly. While this lets us use LightGBM locally on our spark cluster with mmlspark, we get the following exception when using two or more nodes:

[Stage 3:>                                                        (0 + 12) / 16]2019-03-08 07:35:23 WARN  TaskSetManager:66 - Lost task 3.0 in stage 3.0 (TID 243, 178.63.65.13, executor 1): java.lang.NullPointerException
at com.microsoft.ml.spark.TrainUtils$.trainLightGBM(TrainUtils.scala:218)
at com.microsoft.ml.spark.LightGBMClassifier$$anonfun$3.apply(LightGBMClassifier.scala:83)
at com.microsoft.ml.spark.LightGBMClassifier$$anonfun$3.apply(LightGBMClassifier.scala:83)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$5.apply(objects.scala:188)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$5.apply(objects.scala:185)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:121)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:402)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:408)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

And spark worker logs show the same errors as the previous issue:

2019-03-08 07:35:22 INFO  LightGBMClassifier:192 - LightGBM worker got nodes for network init: null
2019-03-08 07:35:22 INFO  LightGBMClassifier:192 - LightGBM worker got nodes for network init: null
2019-03-08 07:35:22 INFO  LightGBMClassifier:192 - LightGBM worker got nodes for network init: null
2019-03-08 07:35:22 INFO  LightGBMClassifier:192 - LightGBM worker got nodes for network init: null
2019-03-08 07:35:22 INFO  LightGBMClassifier:192 - LightGBM worker got nodes for network init: null
2019-03-08 07:35:22 INFO  LightGBMClassifier:192 - LightGBM worker got nodes for network init: null

We've checked our networking settings and everything seems to be fine, so not sure why we're having issues.
Any help would be greatly appreciated!
Many thanks,
David

@imatiach-msft
Copy link
Contributor

@sully90 really sorry about the trouble you are having. Would you be able to provide more logs from the driver and workers? Also, for this issue, that you are trying to workaround: "ran into the GLIBCXX_3.4.20 not found error", what may be a proper fix for this? I could give you a debug build as well if you can send me the jar you built with some extra debug info printed out.

@sully90
Copy link
Author

sully90 commented Mar 14, 2019

Hi @imatiach-msft, I've created a gist with the stdout logs here. As for the GLIBCXX_3.4.20 issue, I believe the problem is that centOS 7 doesn't support the version of libstdc++ that was used to compile LightGBM in the build available via the spark packages, even though LightGBM are meant to guarantee GLIBCXX <= 3.4.19 as of version 2.2.2 (see microsoft/LightGBM#1858). I haven't tried building a MMLSpark JAR, but have tried the following:

  1. Build LightGBM locally:
git clone --recursive -b v2.2.2 https://github.com/Microsoft/LightGBM
cd LightGBM
export JAVA_HOME=/usr/java/latest
cmake -DUSE_SWIG=ON .
make -j8
  1. Update /opt/spark/spark-2.4.0-bin-hadoop2.7/conf/spark-env.sh on each node and add lib_lightgbm.so and lib_lightgbm_swig.so to $LD_LIBRARY_PATH

Then when I try and build a model locally (setting spark-master to local[*]) it works fine by including mmlspark via --packages Azure:mmlspark:0.16.

Thanks a lot for the help!

@a3w4e5r
Copy link

a3w4e5r commented Mar 15, 2019

Hi,For the GLIBCXX_3.4.20 issue, I set a new release of libstdc++.so.6 in spark.yarn.appMasterEnv/executorEnv, and it works ! (master: yarn-cluster)
)

$SPARK_HOME/bin/spark-submit \
--conf "spark.yarn.appMasterEnv.LD_PRELOAD=libstdc++.so.6" \
--conf "spark.yarn.executorEnv.LD_PRELOAD=libstdc++.so.6" \
--files $WORK_PATH/libstdc++.so.6 \

@imatiach-msft
Copy link
Contributor

@sully90 does the suggestion from @a3w4e5r work for you? Do you still see the network error even with the newer version of libstdc++.so.6?

@imatiach-msft
Copy link
Contributor

Also, @sully90 would you be able to send the logs for some of the workers that failed? I looked through the driver log you sent but didn't see anything particularly interesting. If we can debug together over skype/teams/hangouts/phone that might be faster too.

@edsonaoki
Copy link

edsonaoki commented Apr 4, 2019

@a3w4e5r which release of libstdc++.so.6 did you use that works with CentOs 7? I have only used a version for CentOS 6 which I found in https://centos.pkgs.org/6/nux-dextop-x86_64/chrome-deps-stable-3.11-1.x86_64.rpm.html, which results in a connection error when I try to use the model fit function of LightGBM:

java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at java.net.Socket.<init>(Socket.java:434) at java.net.Socket.<init>(Socket.java:211) at com.microsoft.ml.spark.TrainUtils$.getNodes(TrainUtils.scala:178) at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:211) at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:205) at com.microsoft.ml.spark.StreamUtilities$.using(StreamUtilities.scala:29) at com.microsoft.ml.spark.TrainUtils$.trainLightGBM(TrainUtils.scala:204) at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90) at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90) at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:196) at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:193) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 19/04/04 10:34:30 ERROR executor.Executor: Exception in task 0.3 in stage 3.0 (TID 8) java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at java.net.Socket.connect(Socket.java:538) at java.net.Socket.<init>(Socket.java:434) at java.net.Socket.<init>(Socket.java:211) at com.microsoft.ml.spark.TrainUtils$.getNodes(TrainUtils.scala:178) at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:211) at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:205) at com.microsoft.ml.spark.StreamUtilities$.using(StreamUtilities.scala:29) at com.microsoft.ml.spark.TrainUtils$.trainLightGBM(TrainUtils.scala:204) at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90) at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90) at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:196) at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:193) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

@a3w4e5r
Copy link

a3w4e5r commented Apr 4, 2019

@edsonaoki I got libstdc++.so.6 from gcc-7.1.0, works with RedHat 7

@edsonaoki
Copy link

@a3w4e5r thanks! unfortunately I don't have shell access to the Yarn executor nodes, which use RedHat Enterprise 7, and I can't compile gcc there. Is there some way to get a precompiled libstdc++.so.6 for Red Hat 7/CentOS 7?

@a3w4e5r
Copy link

a3w4e5r commented Apr 8, 2019

@edsonaoki Sorry to reply so late,I can send you mine,leave your Email,If you still need it now.

@edsonaoki
Copy link

edsonaoki commented Apr 10, 2019

Hi @a3w4e5r thanks a lot! Can you send to ?

@edsonaoki
Copy link

Hi @imatiach-msft and all, I used the version of libstdc++.so.6 for CentOs 7 that @a3w4e5r compiled and kindly sent to me. As last time, I don't get the "GLIBCXX_3.4.20 not found" error but I get the connection error when I try to use the model fit function of LightGBM. Here are some details in the driver / executor logs:

Driver logs:

2019-04-15 08:59:50 INFO LightGBMRegressor:109 - driver expecting 1 connections...
2019-04-15 08:59:50 INFO LightGBMRegressor:111 - driver accepting a new connection...
2019-04-15 08:59:50 INFO LightGBMRegressor:134 - driver waiting for connections on host: 10.70.22.55 and port: 37521
2019-04-15 08:59:50 INFO LightGBMRegressor:86 - LightGBMRegressor parameters: alpha=0.2 tweedie_variance_power=1.5 is_pre_partition=True boosting_type=gbdt tree_learner=data_parallel num_iterations=100 learning_rate=0.3 num_leaves=31 max_bin=255 bagging_fraction=1.0 bagging_freq=0 bagging_seed=3 early_stopping_round=0 feature_fraction=1.0 max_depth=-1 min_sum_hessian_in_leaf=0.001 num_machines=1 objective=quantile verbosity=1 boost_from_average=true
2019-04-15 08:59:50 INFO SparkContext:54 - Starting job: reduce at LightGBMRegressor.scala:92
2019-04-15 08:59:50 INFO DAGScheduler:54 - Got job 3 (reduce at LightGBMRegressor.scala:92) with 1 output partitions
2019-04-15 08:59:50 INFO DAGScheduler:54 - Final stage: ResultStage 3 (reduce at LightGBMRegressor.scala:92)
2019-04-15 08:59:50 INFO DAGScheduler:54 - Parents of final stage: List()
2019-04-15 08:59:50 INFO DAGScheduler:54 - Missing parents: List()
2019-04-15 08:59:50 INFO DAGScheduler:54 - Submitting ResultStage 3 (MapPartitionsRDD[28] at reduce at LightGBMRegressor.scala:92), which has no missing parents
2019-04-15 08:59:50 INFO MemoryStore:54 - Block broadcast_5 stored as values in memory (estimated size 27.2 KB, free 633.4 MB)
2019-04-15 08:59:50 INFO MemoryStore:54 - Block broadcast_5_piece0 stored as bytes in memory (estimated size 11.9 KB, free 633.4 MB)
2019-04-15 08:59:50 INFO BlockManagerInfo:54 - Added broadcast_5_piece0 in memory on 10.70.22.55:24623 (size: 11.9 KB, free: 633.8 MB)
2019-04-15 08:59:50 INFO SparkContext:54 - Created broadcast 5 from broadcast at DAGScheduler.scala:1006
2019-04-15 08:59:50 INFO DAGScheduler:54 - Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[28] at reduce at LightGBMRegressor.scala:92) (first 15 tasks are for partitions Vector(0))
2019-04-15 08:59:50 INFO YarnScheduler:54 - Adding task set 3.0 with 1 tasks
2019-04-15 08:59:50 INFO TaskSetManager:54 - Starting task 0.0 in stage 3.0 (TID 5, x01gadaapp128a.vsi.sgp.dbs.com, executor 2, partition 0, NODE_LOCAL, 5563 bytes)
2019-04-15 08:59:50 INFO BlockManagerInfo:54 - Added broadcast_5_piece0 in memory on x01gadaapp128a.vsi.sgp.dbs.com:40577 (size: 11.9 KB, free: 366.2 MB)
2019-04-15 08:59:50 INFO BlockManagerInfo:54 - Added rdd_24_0 in memory on x01gadaapp128a.vsi.sgp.dbs.com:40577 (size: 66.7 KB, free: 366.2 MB)
2019-04-15 08:59:52 WARN TaskSetManager:66 - Lost task 0.0 in stage 3.0 (TID 5, x01gadaapp128a.vsi.sgp.dbs.com, executor 2): java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.(Socket.java:434)
at java.net.Socket.(Socket.java:211)
at com.microsoft.ml.spark.TrainUtils$.getNodes(TrainUtils.scala:178)
at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:211)
at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:205)
at com.microsoft.ml.spark.StreamUtilities$.using(StreamUtilities.scala:29)
at com.microsoft.ml.spark.TrainUtils$.trainLightGBM(TrainUtils.scala:204)
at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90)
at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:196)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:193)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Executor logs:

19/04/15 16:59:50 INFO LightGBMRegressor: Successfully bound to port 12432
19/04/15 16:59:52 INFO LightGBMRegressor: LightGBM worker connecting to host: 10.70.22.55 and port: 37521
19/04/15 16:59:52 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 5)
java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at java.net.Socket.connect(Socket.java:538)
at java.net.Socket.(Socket.java:434)
at java.net.Socket.(Socket.java:211)
at com.microsoft.ml.spark.TrainUtils$.getNodes(TrainUtils.scala:178)
at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:211)
at com.microsoft.ml.spark.TrainUtils$$anonfun$5.apply(TrainUtils.scala:205)
at com.microsoft.ml.spark.StreamUtilities$.using(StreamUtilities.scala:29)
at com.microsoft.ml.spark.TrainUtils$.trainLightGBM(TrainUtils.scala:204)
at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90)
at com.microsoft.ml.spark.LightGBMRegressor$$anonfun$3.apply(LightGBMRegressor.scala:90)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:196)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:193)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

It's a bit strange that based on the logs, the driver successfully receives a connection from the executors, performs some tasks, and sends a response, but there is a "Connection error" with no apparent cause later.

For reference, we are running the Spark cluster on Apache Yarn with RHEL 7 (CentOS 7) in the cluster environment.

@edsonaoki
Copy link

Hi everyone. I realised that the LightGBM actually works normally with Yarn in Cluster mode, just not with Yarn in Client mode where it gives the errors above. Perhaps due to the fact that in Client mode, the driver runs in an Ubuntu machine, and hence with a different GLIBC from the executors (running on RHEL 7)?

@imatiach-msft
Copy link
Contributor

closing as now we use the official linux .so files produced from Microsoft/LightGBM build, which uses ubuntu 14.04 docker that does not have the glibc issue.
This was fixed with the PR:
#526
Which updates the lightgbm version to:
"com.microsoft.ml.lightgbm" % "lightgbmlib" % "2.2.350"
It should be available in next release. For now, you can use the latest builds from master, eg the build for that PR was:

--packages
com.microsoft.ml.spark:mmlspark_2.11:0.17.dev1+1.g5e0b2a0
and --repositories
https://mmlspark.azureedge.net/maven

@edsonaoki
Copy link

edsonaoki commented May 6, 2019

@imatiach-msft that's great news! since I can't compile the package on my own (I only have internet connectivity in Windows environment), can you make the Python wheel (.whl) file available somewhere? It's unfortunately necessary to use MMLSpark in the Cloudera Workbench in client mode.

@imatiach-msft
Copy link
Contributor

adding @mhamilton723 - I'm not sure if the python wheel is enough, you need to specify the underlying scala code as part of the spark package. The python wheel is built at that maven url though, but I don't have the permissions to look at the blob, I think Mark may have them. Hmm, I feel like there must exist a better way for you to add a spark package in cloudera workbench, I just don't know anything about how cloudera workbench works. I know in databricks (on azure or aws) you can just add spark packages using the info above directly, and both the scala and pyspark APIs are added.

@edsonaoki
Copy link

Hi @imatiach-msft, to clarify, I can create an internal Maven repository, such that in the cloudera workbench, I can add the MMLSpark as a Spark package pointing to this internal repository. It works normally when I submit the job using spark-submit.

However, when I use the Cloudera Workbench interactive mode, the Python libraries aren't loaded automatically when adding the Spark package, i.e. using "import mmlspark" will generate an error. This error is solved when I install manually the Python wheel file. Therefore, the wheel file would suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants