[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

bjm88620 · 2024-09-05T08:29:15Z

SynapseML version

com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3

System information

Language version (e.g. python 3.8, scala 2.12): python 3.9
Spark Version (e.g. 3.2.3): 3.3.2
Spark Platform (e.g. Synapse, Databricks): Databricks

Describe the problem

I have a for-loop lightgbm fit job for rolling back validation;
The job failed on multi-node cluster with log error Connection Refused, and after checked the failed tasks, the executor failed with detail error message java.lang.ArrayIndexOutOfBoundsException and caused the Connection Refused error;

Meanwhile the job can run on single-node cluster without any issue.

The dataframe sent to model is around 48,000, with partition as below

Partition 0 has 19000 records
Partition 1 has 18000 records
Partition 2 has 7000 records
Partition 3 has 4000 records

And the issue cannot be fixed by df.repartition(5).

Code to reproduce issue

max_base_date = '2024-09-01'
tmp_train_df = train_merged_df.where(sf.col('base_date')<max_base_date).cache()
tmp_actual_df = actual_merged_df.where(sf.col('base_date')<max_base_date).cache()
model.fit(tmp_train_df, tmp_actual_df)

Other info / logs

No response

What component(s) does this bug affect?

What language(s) does this bug affect?

language/scala: Scala source code
language/python: Pyspark APIs
language/r: R APIs
language/csharp: .NET APIs
language/new: Proposals for new client languages

What integration(s) does this bug affect?

integrations/synapse: Azure Synapse integrations
integrations/azureml: Azure ML integrations
integrations/databricks: Databricks integrations

The text was updated successfully, but these errors were encountered:

Fixes #2278 Address the `java.lang.ArrayIndexOutOfBoundsException` error in multi-node cluster runs. * **Error Handling:** - Add error handling for `scoredDataOutPtr` and `scoredDataLengthLongPtr` pointers in the `score`, `predictLeaf`, `featuresShap`, and `innerPredict` methods in `LightGBMBooster.scala`. - Ensure proper deletion of `scoredDataOutPtr` and `scoredDataLengthLongPtr` pointers after use in the `innerPredict` method. * **Testing:** - Add a new test file `LightGBMBoosterTest.scala`. - Add test cases to verify that the `score`, `predictLeaf`, `featuresShap`, and `innerPredict` methods handle `scoredDataOutPtr` and `scoredDataLengthLongPtr` pointers correctly. --- For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/microsoft/SynapseML/issues/2278?shareId=XXXX-XXXX-XXXX-XXXX).

bjm88620 · 2024-09-11T07:28:13Z

Hi @dciborow , I can see the fix PR is created, would like to check whether it will be available for com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3 ? Thanks in advance.

bjm88620 added the bug label Sep 5, 2024

github-actions bot added the triage label Sep 5, 2024

dciborow linked a pull request Sep 7, 2024 that will close this issue

fix: java.lang.ArrayIndexOutOfBoundsException in multi-node cluster run #2282

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

bjm88620 commented Sep 5, 2024

bjm88620 commented Sep 11, 2024 •

edited

Loading

[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

[BUG]java.lang.ArrayIndexOutOfBoundsException on multi-node cluster run #2278

Comments

bjm88620 commented Sep 5, 2024

SynapseML version

System information

Describe the problem

Code to reproduce issue

Other info / logs

What component(s) does this bug affect?

What language(s) does this bug affect?

What integration(s) does this bug affect?

bjm88620 commented Sep 11, 2024 • edited Loading

bjm88620 commented Sep 11, 2024 •

edited

Loading