You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a for-loop lightgbm fit job for rolling back validation;
The job failed on multi-node cluster with log error Connection Refused, and after checked the failed tasks, the executor failed with detail error message java.lang.ArrayIndexOutOfBoundsException and caused the Connection Refused error;
Meanwhile the job can run on single-node cluster without any issue.
The dataframe sent to model is around 48,000, with partition as below
Partition 0 has 19000 records
Partition 1 has 18000 records
Partition 2 has 7000 records
Partition 3 has 4000 records
And the issue cannot be fixed by df.repartition(5).
Fixes#2278
Address the `java.lang.ArrayIndexOutOfBoundsException` error in multi-node cluster runs.
* **Error Handling:**
- Add error handling for `scoredDataOutPtr` and `scoredDataLengthLongPtr` pointers in the `score`, `predictLeaf`, `featuresShap`, and `innerPredict` methods in `LightGBMBooster.scala`.
- Ensure proper deletion of `scoredDataOutPtr` and `scoredDataLengthLongPtr` pointers after use in the `innerPredict` method.
* **Testing:**
- Add a new test file `LightGBMBoosterTest.scala`.
- Add test cases to verify that the `score`, `predictLeaf`, `featuresShap`, and `innerPredict` methods handle `scoredDataOutPtr` and `scoredDataLengthLongPtr` pointers correctly.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/microsoft/SynapseML/issues/2278?shareId=XXXX-XXXX-XXXX-XXXX).
Hi @dciborow , I can see the fix PR is created, would like to check whether it will be available for com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3 ? Thanks in advance.
SynapseML version
com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3
System information
Describe the problem
I have a for-loop lightgbm fit job for rolling back validation;
The job failed on multi-node cluster with log error
Connection Refused
, and after checked the failed tasks, the executor failed with detail error messagejava.lang.ArrayIndexOutOfBoundsException
and caused theConnection Refused
error;Meanwhile the job can run on single-node cluster without any issue.
The dataframe sent to model is around 48,000, with partition as below
Partition 0 has 19000 records
Partition 1 has 18000 records
Partition 2 has 7000 records
Partition 3 has 4000 records
And the issue cannot be fixed by
df.repartition(5)
.Code to reproduce issue
Other info / logs
No response
What component(s) does this bug affect?
area/cognitive
: Cognitive projectarea/core
: Core projectarea/deep-learning
: DeepLearning projectarea/lightgbm
: Lightgbm projectarea/opencv
: Opencv projectarea/vw
: VW projectarea/website
: Websitearea/build
: Project build systemarea/notebooks
: Samples under notebooks folderarea/docker
: Docker usagearea/models
: models related issueWhat language(s) does this bug affect?
language/scala
: Scala source codelanguage/python
: Pyspark APIslanguage/r
: R APIslanguage/csharp
: .NET APIslanguage/new
: Proposals for new client languagesWhat integration(s) does this bug affect?
integrations/synapse
: Azure Synapse integrationsintegrations/azureml
: Azure ML integrationsintegrations/databricks
: Databricks integrationsThe text was updated successfully, but these errors were encountered: