[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

ranInc · 2020-07-02T08:38:07Z

Some models’ predictions fail on the following error:
Check failed: weights_.Size() == num_row_ (15363 vs. 15362) : Size of weights must equal to number of rows.
The numbers in the error are of course not always the same (but the subtraction is always 1).
The same data/model works on xgboost 0.9.

trivialfis · 2020-07-02T16:23:57Z

Size of weight is either equal to number of groups or number of rows. As you can come to this check, I assume you are not doing ranking.

xgboost/src/data/data.cc

Line 389 in eb067c1

return;

ranInc · 2020-07-03T14:51:02Z

I am not doing ranking.
I still did not understand what is wrong here - or why this model works on 0.9.

trivialfis · 2020-07-04T07:34:57Z

The number of weights should be equal to number of rows. Since the weight is defined for each data instance.

In later version of XGBoost we are adding a lots of checks to prevent user errors. If you are using Python or R XGBoost will even check your parameters.

ranInc · 2020-07-05T08:33:52Z

But I am not using weights....
there is not weightcol.

ranInc · 2020-07-06T15:31:47Z

Because of this bug I cannot upgrade to 1.1.1, and to spark 3.0 - because this error seems to also happen in 1.0.0.
I want to upload a parquet file with data and a model that produces this error, so you can investigate the issue - but I don't see where I can do that.

Do you any speculation to when this can be fixed?
Also is there a way to go around the issue?

trivialfis · 2020-07-06T15:38:33Z

Do you have something I can run and reproduce? The bug you described doesn't show up on our tests.

ranInc · 2020-07-07T10:11:58Z

I uploaded a zip file with the data folder and model folder.
replace "/tmp/xgboost_test/data" to wherever you unzip the data folder,
and "/tmp/xgboost_test/model" to wherever you unzip the model folder.
this is the scala test code:

import org.apache.spark.ml.PipelineModel
import org.apache.spark.sql.SparkSession

object XgboostTest {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().enableHiveSupport().master("local").getOrCreate();
    try{
      val data = spark.read.parquet(("/tmp/xgboost_test/data"))
      val model = PipelineModel.load("/tmp/xgboost_test/model")
      val predictions = model.transform(data)
      predictions.persist()
      predictions.count()
      predictions.show()
    }finally {
      spark.close()
    }
  }
}

xgboost_test.zip

trivialfis · 2020-07-07T17:59:16Z

Thanks! Let me check that later this week. I'm not familiar with spark so might take some time.

ranInc · 2020-07-12T07:31:54Z

Hi,
any update?

trivialfis · 2020-07-13T12:11:33Z

Not yet.

ranInc · 2020-07-22T17:49:48Z

still nothing?

hcho3 · 2020-07-22T17:56:07Z

@ranInc Not yet.

hcho3 · 2020-07-22T20:35:42Z

@ranInc I managed to reproduce the error on my end.

hcho3 · 2020-07-22T21:21:29Z

Full error log

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark-3.0.0-bin-hadoop2.7/jars/spark-unsafe_2.12-3.0.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/07/22 21:49:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hive.conf.HiveConf).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/07/22 21:49:54 INFO SparkContext: Running Spark version 3.0.0
20/07/22 21:49:54 INFO ResourceUtils: ==============================================================
20/07/22 21:49:54 INFO ResourceUtils: Resources for spark.driver:

20/07/22 21:49:54 INFO ResourceUtils: ==============================================================
20/07/22 21:49:54 INFO SparkContext: Submitted application: ml.dmlc.xgboost4j.scala.example.spark.Foobar
20/07/22 21:49:54 INFO SecurityManager: Changing view acls to: ubuntu
20/07/22 21:49:54 INFO SecurityManager: Changing modify acls to: ubuntu
20/07/22 21:49:54 INFO SecurityManager: Changing view acls groups to: 
20/07/22 21:49:54 INFO SecurityManager: Changing modify acls groups to: 
20/07/22 21:49:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
20/07/22 21:49:55 INFO Utils: Successfully started service 'sparkDriver' on port 45971.
20/07/22 21:49:55 INFO SparkEnv: Registering MapOutputTracker
20/07/22 21:49:55 INFO SparkEnv: Registering BlockManagerMaster
20/07/22 21:49:55 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/07/22 21:49:55 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/07/22 21:49:55 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/07/22 21:49:55 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ce30809d-cd82-41d9-812a-a8cbbc99e865
20/07/22 21:49:55 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
20/07/22 21:49:55 INFO SparkEnv: Registering OutputCommitCoordinator
20/07/22 21:49:55 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/07/22 21:49:55 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://ip-172-31-56-3.us-west-2.compute.internal:4040
20/07/22 21:49:55 INFO SparkContext: Added JAR file:/home/ubuntu/xgboost/jvm-packages/xgboost4j-tester/target/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://ip-172-31-56-3.us-west-2.compute.internal:45971/jars/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1595454595469
20/07/22 21:49:55 INFO Executor: Starting executor ID driver on host ip-172-31-56-3.us-west-2.compute.internal
20/07/22 21:49:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46265.
20/07/22 21:49:55 INFO NettyBlockTransferService: Server created on ip-172-31-56-3.us-west-2.compute.internal:46265
20/07/22 21:49:55 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/07/22 21:49:55 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO BlockManagerMasterEndpoint: Registering block manager ip-172-31-56-3.us-west-2.compute.internal:46265 with 434.4 MiB RAM, BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/ubuntu/xgboost/spark-warehouse').
20/07/22 21:49:55 INFO SharedState: Warehouse path is 'file:/home/ubuntu/xgboost/spark-warehouse'.
20/07/22 21:49:56 INFO InMemoryFileIndex: It took 29 ms to list leaf files for 1 paths.
20/07/22 21:49:56 INFO SparkContext: Starting job: parquet at Foobar.scala:26
20/07/22 21:49:56 INFO DAGScheduler: Got job 0 (parquet at Foobar.scala:26) with 1 output partitions
20/07/22 21:49:56 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at Foobar.scala:26)
20/07/22 21:49:56 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:56 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at Foobar.scala:26), which has no missing parents
20/07/22 21:49:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 73.2 KiB, free 434.3 MiB)
20/07/22 21:49:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.2 KiB, free 434.3 MiB)
20/07/22 21:49:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 26.2 KiB, free: 434.4 MiB)
20/07/22 21:49:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at Foobar.scala:26) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/07/22 21:49:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7538 bytes)
20/07/22 21:49:56 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/07/22 21:49:56 INFO Executor: Fetching spark://ip-172-31-56-3.us-west-2.compute.internal:45971/jars/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1595454595469
20/07/22 21:49:57 INFO TransportClientFactory: Successfully created connection to ip-172-31-56-3.us-west-2.compute.internal/172.31.56.3:45971 after 22 ms (0 ms spent in bootstraps)
20/07/22 21:49:57 INFO Utils: Fetching spark://ip-172-31-56-3.us-west-2.compute.internal:45971/jars/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar to /tmp/spark-c0e5a7b0-7d65-44bb-a064-577e88e4c02d/userFiles-a5823bbc-0326-46b8-9f34-1430ef5f29fd/fetchFileTemp7079394570314619959.tmp
20/07/22 21:49:57 INFO Executor: Adding file:/tmp/spark-c0e5a7b0-7d65-44bb-a064-577e88e4c02d/userFiles-a5823bbc-0326-46b8-9f34-1430ef5f29fd/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar to class loader
20/07/22 21:49:57 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 51176 bytes result sent to driver
20/07/22 21:49:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 681 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/07/22 21:49:57 INFO DAGScheduler: ResultStage 0 (parquet at Foobar.scala:26) finished in 0.806 s
20/07/22 21:49:57 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:57 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
20/07/22 21:49:57 INFO DAGScheduler: Job 0 finished: parquet at Foobar.scala:26, took 0.838716 s
20/07/22 21:49:58 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 26.2 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 127.3 KiB, free 434.3 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.6 KiB, free 434.3 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 23.6 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 1 from textFile at ReadWrite.scala:587
20/07/22 21:49:58 INFO FileInputFormat: Total input paths to process : 1
20/07/22 21:49:58 INFO SparkContext: Starting job: first at ReadWrite.scala:587
20/07/22 21:49:58 INFO DAGScheduler: Got job 1 (first at ReadWrite.scala:587) with 1 output partitions
20/07/22 21:49:58 INFO DAGScheduler: Final stage: ResultStage 1 (first at ReadWrite.scala:587)
20/07/22 21:49:58 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:58 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:58 INFO DAGScheduler: Submitting ResultStage 1 (/home/ubuntu/model/metadata MapPartitionsRDD[3] at textFile at ReadWrite.scala:587), which has no missing parents
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.1 KiB, free 434.2 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KiB, free 434.2 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 2.4 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (/home/ubuntu/model/metadata MapPartitionsRDD[3] at textFile at ReadWrite.scala:587) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:58 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
20/07/22 21:49:58 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7384 bytes)
20/07/22 21:49:58 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
20/07/22 21:49:58 INFO HadoopRDD: Input split: file:/home/ubuntu/model/metadata/part-00000:0+210
20/07/22 21:49:58 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1138 bytes result sent to driver
20/07/22 21:49:58 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 29 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:58 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
20/07/22 21:49:58 INFO DAGScheduler: ResultStage 1 (first at ReadWrite.scala:587) finished in 0.037 s
20/07/22 21:49:58 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
20/07/22 21:49:58 INFO DAGScheduler: Job 1 finished: first at ReadWrite.scala:587, took 0.041109 s
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 127.3 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 23.6 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 23.6 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 3 from textFile at ReadWrite.scala:587
20/07/22 21:49:58 INFO FileInputFormat: Total input paths to process : 1
20/07/22 21:49:58 INFO SparkContext: Starting job: first at ReadWrite.scala:587
20/07/22 21:49:58 INFO DAGScheduler: Got job 2 (first at ReadWrite.scala:587) with 1 output partitions
20/07/22 21:49:58 INFO DAGScheduler: Final stage: ResultStage 2 (first at ReadWrite.scala:587)
20/07/22 21:49:58 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:58 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:58 INFO DAGScheduler: Submitting ResultStage 2 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[5] at textFile at ReadWrite.scala:587), which has no missing parents
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.2 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.4 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 2.4 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[5] at textFile at ReadWrite.scala:587) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:58 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
20/07/22 21:49:58 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7423 bytes)
20/07/22 21:49:58 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
20/07/22 21:49:58 INFO HadoopRDD: Input split: file:/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata/part-00000:0+1093
20/07/22 21:49:58 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2023 bytes result sent to driver
20/07/22 21:49:58 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 8 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:58 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
20/07/22 21:49:58 INFO DAGScheduler: ResultStage 2 (first at ReadWrite.scala:587) finished in 0.016 s
20/07/22 21:49:58 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
20/07/22 21:49:58 INFO DAGScheduler: Job 2 finished: first at ReadWrite.scala:587, took 0.019294 s
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 127.3 KiB, free 434.0 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 23.6 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 23.6 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 5 from textFile at DefaultXGBoostParamsReader.scala:82
20/07/22 21:49:58 INFO FileInputFormat: Total input paths to process : 1
20/07/22 21:49:58 INFO SparkContext: Starting job: first at DefaultXGBoostParamsReader.scala:82
20/07/22 21:49:58 INFO DAGScheduler: Got job 3 (first at DefaultXGBoostParamsReader.scala:82) with 1 output partitions
20/07/22 21:49:58 INFO DAGScheduler: Final stage: ResultStage 3 (first at DefaultXGBoostParamsReader.scala:82)
20/07/22 21:49:58 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:58 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:58 INFO DAGScheduler: Submitting ResultStage 3 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[7] at textFile at DefaultXGBoostParamsReader.scala:82), which has no missing parents
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 4.2 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 2.4 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 2.4 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[7] at textFile at DefaultXGBoostParamsReader.scala:82) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:58 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
20/07/22 21:49:58 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7423 bytes)
20/07/22 21:49:58 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
20/07/22 21:49:58 INFO HadoopRDD: Input split: file:/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata/part-00000:0+1093
20/07/22 21:49:58 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 2023 bytes result sent to driver
20/07/22 21:49:58 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 8 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:58 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
20/07/22 21:49:58 INFO DAGScheduler: ResultStage 3 (first at DefaultXGBoostParamsReader.scala:82) finished in 0.015 s
20/07/22 21:49:58 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
20/07/22 21:49:58 INFO DAGScheduler: Job 3 finished: first at DefaultXGBoostParamsReader.scala:82, took 0.017654 s
[21:49:58] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:49:58] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
[21:49:58] WARNING: /home/ubuntu/xgboost/src/learner.cc:736: Loading model from XGBoost < 1.0.0, consider saving it again for improved compatibility
20/07/22 21:49:58 INFO Instrumentation: [27496457] training finished
20/07/22 21:49:58 INFO Instrumentation: [b0644076] training finished
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 64.0 B, free 433.9 MiB)
[21:49:58] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 25.5 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 25.5 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 7 from broadcast at XGBoostRegressor.scala:274
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 23.6 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_5_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 23.6 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_4_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 2.4 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 23.6 KiB, free: 434.4 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 2.4 KiB, free: 434.4 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_6_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 2.4 KiB, free: 434.4 MiB)
20/07/22 21:49:59 INFO FileSourceStrategy: Pruning directories with: 
20/07/22 21:49:59 INFO FileSourceStrategy: Pushed Filters: 
20/07/22 21:49:59 INFO FileSourceStrategy: Post-Scan Filters: 
20/07/22 21:49:59 INFO FileSourceStrategy: Output Data Schema: struct<account_code: bigint, features_7528809678875577807: vector>
20/07/22 21:49:59 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 229.7 KiB, free 434.2 MiB)
20/07/22 21:49:59 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 43.6 KiB, free 434.1 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 43.6 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO SparkContext: Created broadcast 8 from rdd at XGBoostRegressor.scala:277
20/07/22 21:49:59 INFO FileSourceScanExec: Planning scan with bin packing, max size: 29262023 bytes, open cost is considered as scanning 4194304 bytes.
[21:49:59] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
20/07/22 21:49:59 INFO Instrumentation: [61612daa] training finished
20/07/22 21:49:59 INFO CodeGenerator: Code generated in 130.613443 ms
20/07/22 21:49:59 INFO CodeGenerator: Code generated in 12.292134 ms
20/07/22 21:49:59 INFO CodeGenerator: Code generated in 27.259111 ms
20/07/22 21:50:00 INFO SparkContext: Starting job: count at Foobar.scala:30
20/07/22 21:50:00 INFO DAGScheduler: Registering RDD 20 (count at Foobar.scala:30) as input to shuffle 0
20/07/22 21:50:00 INFO DAGScheduler: Got job 4 (count at Foobar.scala:30) with 1 output partitions
20/07/22 21:50:00 INFO DAGScheduler: Final stage: ResultStage 5 (count at Foobar.scala:30)
20/07/22 21:50:00 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 4)
20/07/22 21:50:00 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 4)
20/07/22 21:50:00 INFO DAGScheduler: Submitting ShuffleMapStage 4 (MapPartitionsRDD[20] at count at Foobar.scala:30), which has no missing parents
[21:50:00] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
20/07/22 21:50:00 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 158.9 KiB, free 434.0 MiB)
20/07/22 21:50:00 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 59.7 KiB, free 433.9 MiB)
20/07/22 21:50:00 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 59.7 KiB, free: 434.3 MiB)
20/07/22 21:50:00 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1200
20/07/22 21:50:00 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 4 (MapPartitionsRDD[20] at count at Foobar.scala:30) (first 15 tasks are for partitions Vector(0))
20/07/22 21:50:00 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
20/07/22 21:50:00 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7778 bytes)
20/07/22 21:50:00 INFO Executor: Running task 0.0 in stage 4.0 (TID 4)
[21:50:00] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:50:00] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
20/07/22 21:50:00 INFO CodeGenerator: Code generated in 21.374484 ms
20/07/22 21:50:00 INFO CodeGenerator: Code generated in 13.316838 ms
20/07/22 21:50:00 INFO FileScanRDD: Reading File path: file:///home/ubuntu/data/part-00000-3655a559-8238-46e0-a145-f4b4bbc75ad6-c000.snappy.parquet, range: 0-25067719, partition values: [empty row]
20/07/22 21:50:00 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 1522672 records.
20/07/22 21:50:00 INFO InternalParquetRecordReader: at row 0. reading next block
20/07/22 21:50:00 INFO CodecPool: Got brand-new decompressor [.snappy]
20/07/22 21:50:00 INFO InternalParquetRecordReader: block read in memory in 45 ms. row count = 1522672
[21:50:00] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] !!!!!! Weight exists, size = 32768
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
20/07/22 21:50:01 INFO CodeGenerator: Code generated in 32.378279 ms
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] !!!!!! Weight exists, size = 32768
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] !!!!!! Weight exists, size = 32768
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
20/07/22 21:50:01 WARN BlockManager: Putting block rdd_16_0 failed due to exception ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]

.
20/07/22 21:50:01 WARN BlockManager: Block rdd_16_0 could not be removed as it was not found on disk or in memory
20/07/22 21:50:01 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
20/07/22 21:50:01 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver): ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

20/07/22 21:50:01 ERROR TaskSetManager: Task 0 in stage 4.0 failed 1 times; aborting job
20/07/22 21:50:01 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
20/07/22 21:50:01 INFO TaskSchedulerImpl: Cancelling stage 4
20/07/22 21:50:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage cancelled
20/07/22 21:50:01 INFO DAGScheduler: ShuffleMapStage 4 (count at Foobar.scala:30) failed in 1.696 s due to Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver): ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
20/07/22 21:50:01 INFO DAGScheduler: Job 4 failed: count at Foobar.scala:30, took 1.709690 s
20/07/22 21:50:01 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-56-3.us-west-2.compute.internal:4040
20/07/22 21:50:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/07/22 21:50:01 INFO MemoryStore: MemoryStore cleared
20/07/22 21:50:01 INFO BlockManager: BlockManager stopped
20/07/22 21:50:01 INFO BlockManagerMaster: BlockManagerMaster stopped
20/07/22 21:50:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/07/22 21:50:01 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver): ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
	at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:2979)
	at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:2978)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
	at org.apache.spark.sql.Dataset.count(Dataset.scala:2978)
	at ml.dmlc.xgboost4j.scala.example.spark.Foobar$.main(Foobar.scala:30)
	at ml.dmlc.xgboost4j.scala.example.spark.Foobar.main(Foobar.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
20/07/22 21:50:01 INFO ShutdownHookManager: Shutdown hook called
20/07/22 21:50:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-34f9a30d-5a06-4250-8430-29d5461290ae
20/07/22 21:50:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-c0e5a7b0-7d65-44bb-a064-577e88e4c02d

xgboost4j_spark_crash_log.txt

I used #5925 to log all invocations of the C API functions.

All C API invocations:

[21:49:58] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:49:58] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
[21:49:58] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
[21:49:59] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
[21:50:00] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
[21:50:00] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:50:00] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
[21:50:00] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)

hcho3 · 2020-07-23T00:44:35Z

@trivialfis @RAMitchell We have an issue with the iterator adaptor. Consider a CSR batch consisting 32768 rows whose the last row is empty (no non-zero element). The common::ParallelGroupBuilder() function (used in SparsePage::Push()) deduces the number of rows in the batch to be 32767, because the last row contained no non-zero element. On the other hand, the weight vector is initialized to be a vector of size 32768 filled with 1.0, because all data points get 1.0 weight by default.

We will need to handle empty trailing rows or columns carefully.

trivialfis · 2020-07-23T01:16:47Z

@hcho3 Glad that you take over this.

hcho3 · 2020-07-23T02:49:53Z

The most minimal example: apply the following patch to the C++ unit test:

diff --git tests/cpp/data/test_adapter.cc tests/cpp/data/test_adapter.cc
index de835358..1da2a71c 100644
--- tests/cpp/data/test_adapter.cc
+++ tests/cpp/data/test_adapter.cc
@@ -73,10 +73,11 @@ class CSRIterForTest {
   std::vector<std::remove_pointer<decltype(std::declval<XGBoostBatchCSR>().index)>::type>
       feature_idx_ {0, 1, 0, 1, 1};
   std::vector<std::remove_pointer<decltype(std::declval<XGBoostBatchCSR>().offset)>::type>
-      row_ptr_ {0, 2, 4, 5};
+      row_ptr_ {0, 2, 4, 5, 5};
   size_t iter_ {0};
 
  public:
+  size_t static constexpr kRows { 4 };  // Test for the last row being empty
   size_t static constexpr kCols { 13 };  // Test for having some missing columns
 
   XGBoostBatchCSR Next() {
@@ -88,7 +89,7 @@ class CSRIterForTest {
     batch.offset = dmlc::BeginPtr(row_ptr_);
     batch.index = dmlc::BeginPtr(feature_idx_);
     batch.value = dmlc::BeginPtr(data_);
-    batch.size = 3;
+    batch.size = kRows;
 
     batch.label = nullptr;
     batch.weight = nullptr;
@@ -117,11 +118,11 @@ int CSRSetDataNextForTest(DataIterHandle data_handle,
   }
 }
 
-TEST(Adapter, IteratorAdaper) {
+TEST(Adapter, IteratorAdapter) {
   CSRIterForTest iter;
   data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext,
                         XGBoostBatchCSR> adapter{&iter, CSRSetDataNextForTest};
-  constexpr size_t kRows { 6 };
+  constexpr size_t kRows { 8 };
 
   std::unique_ptr<DMatrix> data {
     DMatrix::Create(&adapter, std::numeric_limits<float>::quiet_NaN(), 1)
@@ -129,4 +130,5 @@ TEST(Adapter, IteratorAdaper) {
   ASSERT_EQ(data->Info().num_col_, CSRIterForTest::kCols);
   ASSERT_EQ(data->Info().num_row_, kRows);
 }
+
 }  // namespace xgboost

Log from ./build/testxgboost --gtest_filter=Adapter.IteratorAdapter:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Adapter
[ RUN      ] Adapter.IteratorAdapter
/home/ubuntu/xgboost/tests/cpp/data/test_adapter.cc:131: Failure
Expected equality of these values:
  data->Info().num_row_
    Which is: 7
  kRows
    Which is: 8
[  FAILED  ] Adapter.IteratorAdapter (0 ms)
[----------] 1 test from Adapter (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Adapter.IteratorAdapter

 1 FAILED TEST

The example shows a matrix where Row ID 3 and 7 are empty.

The NativeDataIter from 0.90 XGBoost used a subclass of dmlc::Parser<> and thus handled empty rows correctly.

hcho3 · 2020-07-23T03:32:29Z

The bug affects IteratorAdapter and FileAdapter, because both returns kAdapterUnknownSize from the NumRows() method. The other three adapters have NumRows() method that returns a number and so they are not affected by the bug.

Minimal example for FileAdapter:

diff --git tests/cpp/data/test_simple_dmatrix.cc tests/cpp/data/test_simple_dmatrix.cc
index 691dc854..563a4949 100644
--- tests/cpp/data/test_simple_dmatrix.cc
+++ tests/cpp/data/test_simple_dmatrix.cc
@@ -185,16 +185,21 @@ TEST(SimpleDMatrix, FromCSC) {
 TEST(SimpleDMatrix, FromFile) {
   std::string filename = "test.libsvm";
   CreateBigTestData(filename, 3 * 5);
+  {
+    std::ofstream fo(filename, std::ios::app | std::ios::out);
+    fo << "0\n";
+  }
+  constexpr size_t expected_nrow = 6;
   std::unique_ptr<dmlc::Parser<uint32_t>> parser(
       dmlc::Parser<uint32_t>::Create(filename.c_str(), 0, 1, "auto"));
 
   auto verify_batch = [](SparsePage const &batch) {
-    EXPECT_EQ(batch.Size(), 5);
+    EXPECT_EQ(batch.Size(), expected_nrow);
     EXPECT_EQ(batch.offset.HostVector(),
-              std::vector<bst_row_t>({0, 3, 6, 9, 12, 15}));
+              std::vector<bst_row_t>({0, 3, 6, 9, 12, 15, 15}));
     EXPECT_EQ(batch.base_rowid, 0);
 
-    for (auto i = 0ull; i < batch.Size(); i++) {
+    for (auto i = 0ull; i < batch.Size() - 1; i++) {
       if (i % 2 == 0) {
         EXPECT_EQ(batch[i][0].index, 0);
         EXPECT_EQ(batch[i][1].index, 1);

hcho3 · 2020-07-23T05:57:40Z

@ranInc Hello, I submitted a pull request #5929 to fix the issue. If you'd like to try it out, run

git clone --recursive https://github.com/hcho3/xgboost -b handle_empty_rows
cd xgboost/jvm-packages
mvn package     # or mvn install

ranInc · 2020-07-23T19:13:01Z

Hi, I think I will be able to test it out next week.
thanks!

ranInc · 2020-07-26T12:22:01Z

Hi,
so I guess i need to wait for the release of 1.2 or use a non release build of 1.2?

hcho3 · 2020-07-26T21:13:39Z

@ranInc Yes. You can either wait for 1.2.0 release or use the SNAPSHOT version.

Bishop-Cui · 2023-08-17T09:28:06Z

@ranInc Yes. You can either wait for the 1.2.0 release or use the SNAPSHOT version.

Hello Hyunsu, I met the same problem like Ranlnc reported in 1.1.2, it's pretty tricky that due to the environment being fixed, we cannot use 1.2.0, and when I want to download the handle empty rows branch in your folk repo, it cannot be found. Would you happen to have any suggestions?

hcho3 · 2023-08-17T09:46:38Z

The bug has been long fixed, starting from 1.2.0. Please upgrade to the latest XGBoost. We are not able to support very old versions.

trivialfis closed this as completed Jul 2, 2020

trivialfis reopened this Jul 5, 2020

ranInc changed the title ~~Xgboost4spark 1.1.1 weights_.Size() == num_row_~~ Xgboost4spark 1.1.1 broken and just does not work consistently Jul 13, 2020

ranInc changed the title ~~Xgboost4spark 1.1.1 broken and just does not work consistently~~ Xgboost4spark 1.1.1 broken and consistently does not work Jul 13, 2020

hcho3 self-assigned this Jul 22, 2020

hcho3 mentioned this issue Jul 22, 2020

Add CMake flag to log C API invocations, to aid debugging #5925

Merged

hcho3 changed the title ~~Xgboost4spark 1.1.1 broken and consistently does not work~~ [jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work Jul 23, 2020

This comment has been minimized.

Sign in to view

hcho3 mentioned this issue Jul 23, 2020

[BLOCKING] Handle empty rows in data iterators correctly #5929

Merged

hcho3 closed this as completed in #5929 Jul 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

ranInc commented Jul 2, 2020

trivialfis commented Jul 2, 2020

ranInc commented Jul 3, 2020

trivialfis commented Jul 4, 2020 •

edited

Loading

ranInc commented Jul 5, 2020

ranInc commented Jul 6, 2020

trivialfis commented Jul 6, 2020

ranInc commented Jul 7, 2020 •

edited by hcho3

Loading

trivialfis commented Jul 7, 2020

ranInc commented Jul 12, 2020

trivialfis commented Jul 13, 2020

ranInc commented Jul 22, 2020

hcho3 commented Jul 22, 2020

hcho3 commented Jul 22, 2020

hcho3 commented Jul 22, 2020 •

edited

Loading

hcho3 commented Jul 23, 2020

trivialfis commented Jul 23, 2020

hcho3 commented Jul 23, 2020 •

edited

Loading

hcho3 commented Jul 23, 2020 •

edited

Loading

This comment has been minimized.

hcho3 commented Jul 23, 2020 •

edited

Loading

ranInc commented Jul 23, 2020

ranInc commented Jul 26, 2020

hcho3 commented Jul 26, 2020

Bishop-Cui commented Aug 17, 2023

hcho3 commented Aug 17, 2023

[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

Comments

ranInc commented Jul 2, 2020

trivialfis commented Jul 2, 2020

ranInc commented Jul 3, 2020

trivialfis commented Jul 4, 2020 • edited Loading

ranInc commented Jul 5, 2020

ranInc commented Jul 6, 2020

trivialfis commented Jul 6, 2020

ranInc commented Jul 7, 2020 • edited by hcho3 Loading

trivialfis commented Jul 7, 2020

ranInc commented Jul 12, 2020

trivialfis commented Jul 13, 2020

ranInc commented Jul 22, 2020

hcho3 commented Jul 22, 2020

hcho3 commented Jul 22, 2020

hcho3 commented Jul 22, 2020 • edited Loading

hcho3 commented Jul 23, 2020

trivialfis commented Jul 23, 2020

hcho3 commented Jul 23, 2020 • edited Loading

hcho3 commented Jul 23, 2020 • edited Loading

This comment has been minimized.

hcho3 commented Jul 23, 2020 • edited Loading

ranInc commented Jul 23, 2020

ranInc commented Jul 26, 2020

hcho3 commented Jul 26, 2020

Bishop-Cui commented Aug 17, 2023

hcho3 commented Aug 17, 2023

trivialfis commented Jul 4, 2020 •

edited

Loading

ranInc commented Jul 7, 2020 •

edited by hcho3

Loading

hcho3 commented Jul 22, 2020 •

edited

Loading

hcho3 commented Jul 23, 2020 •

edited

Loading

hcho3 commented Jul 23, 2020 •

edited

Loading

hcho3 commented Jul 23, 2020 •

edited

Loading