Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

Closed
ranInc opened this issue Jul 2, 2020 · 25 comments · Fixed by #5929
Closed

[jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work #5848

ranInc opened this issue Jul 2, 2020 · 25 comments · Fixed by #5929
Assignees

Comments

@ranInc
Copy link

ranInc commented Jul 2, 2020

Some models’ predictions fail on the following error:
Check failed: weights_.Size() == num_row_ (15363 vs. 15362) : Size of weights must equal to number of rows.
The numbers in the error are of course not always the same (but the subtraction is always 1).
The same data/model works on xgboost 0.9.

@trivialfis
Copy link
Member

Size of weight is either equal to number of groups or number of rows. As you can come to this check, I assume you are not doing ranking.

return;

@ranInc
Copy link
Author

ranInc commented Jul 3, 2020

I am not doing ranking.
I still did not understand what is wrong here - or why this model works on 0.9.

@trivialfis
Copy link
Member

trivialfis commented Jul 4, 2020

The number of weights should be equal to number of rows. Since the weight is defined for each data instance.

In later version of XGBoost we are adding a lots of checks to prevent user errors. If you are using Python or R XGBoost will even check your parameters.

@ranInc
Copy link
Author

ranInc commented Jul 5, 2020

But I am not using weights....
there is not weightcol.

@trivialfis trivialfis reopened this Jul 5, 2020
@ranInc
Copy link
Author

ranInc commented Jul 6, 2020

Because of this bug I cannot upgrade to 1.1.1, and to spark 3.0 - because this error seems to also happen in 1.0.0.
I want to upload a parquet file with data and a model that produces this error, so you can investigate the issue - but I don't see where I can do that.

Do you any speculation to when this can be fixed?
Also is there a way to go around the issue?

@trivialfis
Copy link
Member

Do you have something I can run and reproduce? The bug you described doesn't show up on our tests.

@ranInc
Copy link
Author

ranInc commented Jul 7, 2020

I uploaded a zip file with the data folder and model folder.
replace "/tmp/xgboost_test/data" to wherever you unzip the data folder,
and "/tmp/xgboost_test/model" to wherever you unzip the model folder.
this is the scala test code:

import org.apache.spark.ml.PipelineModel
import org.apache.spark.sql.SparkSession

object XgboostTest {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().enableHiveSupport().master("local").getOrCreate();
    try{
      val data = spark.read.parquet(("/tmp/xgboost_test/data"))
      val model = PipelineModel.load("/tmp/xgboost_test/model")
      val predictions = model.transform(data)
      predictions.persist()
      predictions.count()
      predictions.show()
    }finally {
      spark.close()
    }
  }
}

xgboost_test.zip

@trivialfis
Copy link
Member

Thanks! Let me check that later this week. I'm not familiar with spark so might take some time.

@ranInc
Copy link
Author

ranInc commented Jul 12, 2020

Hi,
any update?

@ranInc ranInc changed the title Xgboost4spark 1.1.1 weights_.Size() == num_row_ Xgboost4spark 1.1.1 broken and just does not work consistently Jul 13, 2020
@ranInc ranInc changed the title Xgboost4spark 1.1.1 broken and just does not work consistently Xgboost4spark 1.1.1 broken and consistently does not work Jul 13, 2020
@trivialfis
Copy link
Member

Not yet.

@ranInc
Copy link
Author

ranInc commented Jul 22, 2020

still nothing?

@hcho3
Copy link
Collaborator

hcho3 commented Jul 22, 2020

@ranInc Not yet.

@hcho3
Copy link
Collaborator

hcho3 commented Jul 22, 2020

@ranInc I managed to reproduce the error on my end.

@hcho3 hcho3 self-assigned this Jul 22, 2020
@hcho3
Copy link
Collaborator

hcho3 commented Jul 22, 2020

Full error log

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark-3.0.0-bin-hadoop2.7/jars/spark-unsafe_2.12-3.0.0.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/07/22 21:49:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.hadoop.hive.conf.HiveConf).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/07/22 21:49:54 INFO SparkContext: Running Spark version 3.0.0
20/07/22 21:49:54 INFO ResourceUtils: ==============================================================
20/07/22 21:49:54 INFO ResourceUtils: Resources for spark.driver:

20/07/22 21:49:54 INFO ResourceUtils: ==============================================================
20/07/22 21:49:54 INFO SparkContext: Submitted application: ml.dmlc.xgboost4j.scala.example.spark.Foobar
20/07/22 21:49:54 INFO SecurityManager: Changing view acls to: ubuntu
20/07/22 21:49:54 INFO SecurityManager: Changing modify acls to: ubuntu
20/07/22 21:49:54 INFO SecurityManager: Changing view acls groups to: 
20/07/22 21:49:54 INFO SecurityManager: Changing modify acls groups to: 
20/07/22 21:49:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(ubuntu); groups with view permissions: Set(); users  with modify permissions: Set(ubuntu); groups with modify permissions: Set()
20/07/22 21:49:55 INFO Utils: Successfully started service 'sparkDriver' on port 45971.
20/07/22 21:49:55 INFO SparkEnv: Registering MapOutputTracker
20/07/22 21:49:55 INFO SparkEnv: Registering BlockManagerMaster
20/07/22 21:49:55 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
20/07/22 21:49:55 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
20/07/22 21:49:55 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
20/07/22 21:49:55 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-ce30809d-cd82-41d9-812a-a8cbbc99e865
20/07/22 21:49:55 INFO MemoryStore: MemoryStore started with capacity 434.4 MiB
20/07/22 21:49:55 INFO SparkEnv: Registering OutputCommitCoordinator
20/07/22 21:49:55 INFO Utils: Successfully started service 'SparkUI' on port 4040.
20/07/22 21:49:55 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://ip-172-31-56-3.us-west-2.compute.internal:4040
20/07/22 21:49:55 INFO SparkContext: Added JAR file:/home/ubuntu/xgboost/jvm-packages/xgboost4j-tester/target/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar at spark://ip-172-31-56-3.us-west-2.compute.internal:45971/jars/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1595454595469
20/07/22 21:49:55 INFO Executor: Starting executor ID driver on host ip-172-31-56-3.us-west-2.compute.internal
20/07/22 21:49:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 46265.
20/07/22 21:49:55 INFO NettyBlockTransferService: Server created on ip-172-31-56-3.us-west-2.compute.internal:46265
20/07/22 21:49:55 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
20/07/22 21:49:55 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO BlockManagerMasterEndpoint: Registering block manager ip-172-31-56-3.us-west-2.compute.internal:46265 with 434.4 MiB RAM, BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, ip-172-31-56-3.us-west-2.compute.internal, 46265, None)
20/07/22 21:49:55 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/home/ubuntu/xgboost/spark-warehouse').
20/07/22 21:49:55 INFO SharedState: Warehouse path is 'file:/home/ubuntu/xgboost/spark-warehouse'.
20/07/22 21:49:56 INFO InMemoryFileIndex: It took 29 ms to list leaf files for 1 paths.
20/07/22 21:49:56 INFO SparkContext: Starting job: parquet at Foobar.scala:26
20/07/22 21:49:56 INFO DAGScheduler: Got job 0 (parquet at Foobar.scala:26) with 1 output partitions
20/07/22 21:49:56 INFO DAGScheduler: Final stage: ResultStage 0 (parquet at Foobar.scala:26)
20/07/22 21:49:56 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:56 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:56 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at parquet at Foobar.scala:26), which has no missing parents
20/07/22 21:49:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 73.2 KiB, free 434.3 MiB)
20/07/22 21:49:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.2 KiB, free 434.3 MiB)
20/07/22 21:49:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 26.2 KiB, free: 434.4 MiB)
20/07/22 21:49:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at parquet at Foobar.scala:26) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
20/07/22 21:49:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7538 bytes)
20/07/22 21:49:56 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
20/07/22 21:49:56 INFO Executor: Fetching spark://ip-172-31-56-3.us-west-2.compute.internal:45971/jars/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar with timestamp 1595454595469
20/07/22 21:49:57 INFO TransportClientFactory: Successfully created connection to ip-172-31-56-3.us-west-2.compute.internal/172.31.56.3:45971 after 22 ms (0 ms spent in bootstraps)
20/07/22 21:49:57 INFO Utils: Fetching spark://ip-172-31-56-3.us-west-2.compute.internal:45971/jars/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar to /tmp/spark-c0e5a7b0-7d65-44bb-a064-577e88e4c02d/userFiles-a5823bbc-0326-46b8-9f34-1430ef5f29fd/fetchFileTemp7079394570314619959.tmp
20/07/22 21:49:57 INFO Executor: Adding file:/tmp/spark-c0e5a7b0-7d65-44bb-a064-577e88e4c02d/userFiles-a5823bbc-0326-46b8-9f34-1430ef5f29fd/xgboost4j-tester_2.12-1.0-SNAPSHOT-jar-with-dependencies.jar to class loader
20/07/22 21:49:57 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 51176 bytes result sent to driver
20/07/22 21:49:57 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 681 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
20/07/22 21:49:57 INFO DAGScheduler: ResultStage 0 (parquet at Foobar.scala:26) finished in 0.806 s
20/07/22 21:49:57 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:57 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
20/07/22 21:49:57 INFO DAGScheduler: Job 0 finished: parquet at Foobar.scala:26, took 0.838716 s
20/07/22 21:49:58 INFO BlockManagerInfo: Removed broadcast_0_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 26.2 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 127.3 KiB, free 434.3 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 23.6 KiB, free 434.3 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 23.6 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 1 from textFile at ReadWrite.scala:587
20/07/22 21:49:58 INFO FileInputFormat: Total input paths to process : 1
20/07/22 21:49:58 INFO SparkContext: Starting job: first at ReadWrite.scala:587
20/07/22 21:49:58 INFO DAGScheduler: Got job 1 (first at ReadWrite.scala:587) with 1 output partitions
20/07/22 21:49:58 INFO DAGScheduler: Final stage: ResultStage 1 (first at ReadWrite.scala:587)
20/07/22 21:49:58 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:58 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:58 INFO DAGScheduler: Submitting ResultStage 1 (/home/ubuntu/model/metadata MapPartitionsRDD[3] at textFile at ReadWrite.scala:587), which has no missing parents
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 4.1 KiB, free 434.2 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 2.4 KiB, free 434.2 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 2.4 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (/home/ubuntu/model/metadata MapPartitionsRDD[3] at textFile at ReadWrite.scala:587) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:58 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
20/07/22 21:49:58 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7384 bytes)
20/07/22 21:49:58 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
20/07/22 21:49:58 INFO HadoopRDD: Input split: file:/home/ubuntu/model/metadata/part-00000:0+210
20/07/22 21:49:58 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1138 bytes result sent to driver
20/07/22 21:49:58 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 29 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:58 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
20/07/22 21:49:58 INFO DAGScheduler: ResultStage 1 (first at ReadWrite.scala:587) finished in 0.037 s
20/07/22 21:49:58 INFO DAGScheduler: Job 1 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished
20/07/22 21:49:58 INFO DAGScheduler: Job 1 finished: first at ReadWrite.scala:587, took 0.041109 s
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 127.3 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 23.6 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 23.6 KiB, free: 434.4 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 3 from textFile at ReadWrite.scala:587
20/07/22 21:49:58 INFO FileInputFormat: Total input paths to process : 1
20/07/22 21:49:58 INFO SparkContext: Starting job: first at ReadWrite.scala:587
20/07/22 21:49:58 INFO DAGScheduler: Got job 2 (first at ReadWrite.scala:587) with 1 output partitions
20/07/22 21:49:58 INFO DAGScheduler: Final stage: ResultStage 2 (first at ReadWrite.scala:587)
20/07/22 21:49:58 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:58 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:58 INFO DAGScheduler: Submitting ResultStage 2 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[5] at textFile at ReadWrite.scala:587), which has no missing parents
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 4.2 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.4 KiB, free 434.1 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 2.4 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[5] at textFile at ReadWrite.scala:587) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:58 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
20/07/22 21:49:58 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7423 bytes)
20/07/22 21:49:58 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
20/07/22 21:49:58 INFO HadoopRDD: Input split: file:/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata/part-00000:0+1093
20/07/22 21:49:58 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2023 bytes result sent to driver
20/07/22 21:49:58 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 8 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:58 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
20/07/22 21:49:58 INFO DAGScheduler: ResultStage 2 (first at ReadWrite.scala:587) finished in 0.016 s
20/07/22 21:49:58 INFO DAGScheduler: Job 2 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 2: Stage finished
20/07/22 21:49:58 INFO DAGScheduler: Job 2 finished: first at ReadWrite.scala:587, took 0.019294 s
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 127.3 KiB, free 434.0 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 23.6 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 23.6 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 5 from textFile at DefaultXGBoostParamsReader.scala:82
20/07/22 21:49:58 INFO FileInputFormat: Total input paths to process : 1
20/07/22 21:49:58 INFO SparkContext: Starting job: first at DefaultXGBoostParamsReader.scala:82
20/07/22 21:49:58 INFO DAGScheduler: Got job 3 (first at DefaultXGBoostParamsReader.scala:82) with 1 output partitions
20/07/22 21:49:58 INFO DAGScheduler: Final stage: ResultStage 3 (first at DefaultXGBoostParamsReader.scala:82)
20/07/22 21:49:58 INFO DAGScheduler: Parents of final stage: List()
20/07/22 21:49:58 INFO DAGScheduler: Missing parents: List()
20/07/22 21:49:58 INFO DAGScheduler: Submitting ResultStage 3 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[7] at textFile at DefaultXGBoostParamsReader.scala:82), which has no missing parents
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 4.2 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 2.4 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 2.4 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1200
20/07/22 21:49:58 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata MapPartitionsRDD[7] at textFile at DefaultXGBoostParamsReader.scala:82) (first 15 tasks are for partitions Vector(0))
20/07/22 21:49:58 INFO TaskSchedulerImpl: Adding task set 3.0 with 1 tasks
20/07/22 21:49:58 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 3, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7423 bytes)
20/07/22 21:49:58 INFO Executor: Running task 0.0 in stage 3.0 (TID 3)
20/07/22 21:49:58 INFO HadoopRDD: Input split: file:/home/ubuntu/model/stages/0_XGBoostRegressor_dfcc0f11d073/metadata/part-00000:0+1093
20/07/22 21:49:58 INFO Executor: Finished task 0.0 in stage 3.0 (TID 3). 2023 bytes result sent to driver
20/07/22 21:49:58 INFO TaskSetManager: Finished task 0.0 in stage 3.0 (TID 3) in 8 ms on ip-172-31-56-3.us-west-2.compute.internal (executor driver) (1/1)
20/07/22 21:49:58 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 
20/07/22 21:49:58 INFO DAGScheduler: ResultStage 3 (first at DefaultXGBoostParamsReader.scala:82) finished in 0.015 s
20/07/22 21:49:58 INFO DAGScheduler: Job 3 is finished. Cancelling potential speculative or zombie tasks for this job
20/07/22 21:49:58 INFO TaskSchedulerImpl: Killing all running tasks in stage 3: Stage finished
20/07/22 21:49:58 INFO DAGScheduler: Job 3 finished: first at DefaultXGBoostParamsReader.scala:82, took 0.017654 s
[21:49:58] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:49:58] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
[21:49:58] WARNING: /home/ubuntu/xgboost/src/learner.cc:736: Loading model from XGBoost < 1.0.0, consider saving it again for improved compatibility
20/07/22 21:49:58 INFO Instrumentation: [27496457] training finished
20/07/22 21:49:58 INFO Instrumentation: [b0644076] training finished
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 64.0 B, free 433.9 MiB)
[21:49:58] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
20/07/22 21:49:58 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 25.5 KiB, free 433.9 MiB)
20/07/22 21:49:58 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 25.5 KiB, free: 434.3 MiB)
20/07/22 21:49:58 INFO SparkContext: Created broadcast 7 from broadcast at XGBoostRegressor.scala:274
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_3_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 23.6 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_5_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 23.6 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_4_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 2.4 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_1_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 23.6 KiB, free: 434.4 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_2_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 2.4 KiB, free: 434.4 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Removed broadcast_6_piece0 on ip-172-31-56-3.us-west-2.compute.internal:46265 in memory (size: 2.4 KiB, free: 434.4 MiB)
20/07/22 21:49:59 INFO FileSourceStrategy: Pruning directories with: 
20/07/22 21:49:59 INFO FileSourceStrategy: Pushed Filters: 
20/07/22 21:49:59 INFO FileSourceStrategy: Post-Scan Filters: 
20/07/22 21:49:59 INFO FileSourceStrategy: Output Data Schema: struct<account_code: bigint, features_7528809678875577807: vector>
20/07/22 21:49:59 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 229.7 KiB, free 434.2 MiB)
20/07/22 21:49:59 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 43.6 KiB, free 434.1 MiB)
20/07/22 21:49:59 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 43.6 KiB, free: 434.3 MiB)
20/07/22 21:49:59 INFO SparkContext: Created broadcast 8 from rdd at XGBoostRegressor.scala:277
20/07/22 21:49:59 INFO FileSourceScanExec: Planning scan with bin packing, max size: 29262023 bytes, open cost is considered as scanning 4194304 bytes.
[21:49:59] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
20/07/22 21:49:59 INFO Instrumentation: [61612daa] training finished
20/07/22 21:49:59 INFO CodeGenerator: Code generated in 130.613443 ms
20/07/22 21:49:59 INFO CodeGenerator: Code generated in 12.292134 ms
20/07/22 21:49:59 INFO CodeGenerator: Code generated in 27.259111 ms
20/07/22 21:50:00 INFO SparkContext: Starting job: count at Foobar.scala:30
20/07/22 21:50:00 INFO DAGScheduler: Registering RDD 20 (count at Foobar.scala:30) as input to shuffle 0
20/07/22 21:50:00 INFO DAGScheduler: Got job 4 (count at Foobar.scala:30) with 1 output partitions
20/07/22 21:50:00 INFO DAGScheduler: Final stage: ResultStage 5 (count at Foobar.scala:30)
20/07/22 21:50:00 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 4)
20/07/22 21:50:00 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 4)
20/07/22 21:50:00 INFO DAGScheduler: Submitting ShuffleMapStage 4 (MapPartitionsRDD[20] at count at Foobar.scala:30), which has no missing parents
[21:50:00] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
20/07/22 21:50:00 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 158.9 KiB, free 434.0 MiB)
20/07/22 21:50:00 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 59.7 KiB, free 433.9 MiB)
20/07/22 21:50:00 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on ip-172-31-56-3.us-west-2.compute.internal:46265 (size: 59.7 KiB, free: 434.3 MiB)
20/07/22 21:50:00 INFO SparkContext: Created broadcast 9 from broadcast at DAGScheduler.scala:1200
20/07/22 21:50:00 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 4 (MapPartitionsRDD[20] at count at Foobar.scala:30) (first 15 tasks are for partitions Vector(0))
20/07/22 21:50:00 INFO TaskSchedulerImpl: Adding task set 4.0 with 1 tasks
20/07/22 21:50:00 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver, partition 0, PROCESS_LOCAL, 7778 bytes)
20/07/22 21:50:00 INFO Executor: Running task 0.0 in stage 4.0 (TID 4)
[21:50:00] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:50:00] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
20/07/22 21:50:00 INFO CodeGenerator: Code generated in 21.374484 ms
20/07/22 21:50:00 INFO CodeGenerator: Code generated in 13.316838 ms
20/07/22 21:50:00 INFO FileScanRDD: Reading File path: file:///home/ubuntu/data/part-00000-3655a559-8238-46e0-a145-f4b4bbc75ad6-c000.snappy.parquet, range: 0-25067719, partition values: [empty row]
20/07/22 21:50:00 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 1522672 records.
20/07/22 21:50:00 INFO InternalParquetRecordReader: at row 0. reading next block
20/07/22 21:50:00 INFO CodecPool: Got brand-new decompressor [.snappy]
20/07/22 21:50:00 INFO InternalParquetRecordReader: block read in memory in 45 ms. row count = 1522672
[21:50:00] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] !!!!!! Weight exists, size = 32768
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
20/07/22 21:50:01 INFO CodeGenerator: Code generated in 32.378279 ms
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] !!!!!! Weight exists, size = 32768
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] !!!!!! Weight exists, size = 32768
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
20/07/22 21:50:01 WARN BlockManager: Putting block rdd_16_0 failed due to exception ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]

.
20/07/22 21:50:01 WARN BlockManager: Block rdd_16_0 could not be removed as it was not found on disk or in memory
20/07/22 21:50:01 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 4)
ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
20/07/22 21:50:01 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver): ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

20/07/22 21:50:01 ERROR TaskSetManager: Task 0 in stage 4.0 failed 1 times; aborting job
20/07/22 21:50:01 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool 
20/07/22 21:50:01 INFO TaskSchedulerImpl: Cancelling stage 4
20/07/22 21:50:01 INFO TaskSchedulerImpl: Killing all running tasks in stage 4: Stage cancelled
20/07/22 21:50:01 INFO DAGScheduler: ShuffleMapStage 4 (count at Foobar.scala:30) failed in 1.696 s due to Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver): ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
20/07/22 21:50:01 INFO DAGScheduler: Job 4 failed: count at Foobar.scala:30, took 1.709690 s
20/07/22 21:50:01 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-56-3.us-west-2.compute.internal:4040
20/07/22 21:50:01 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/07/22 21:50:01 INFO MemoryStore: MemoryStore cleared
20/07/22 21:50:01 INFO BlockManager: BlockManager stopped
20/07/22 21:50:01 INFO BlockManagerMaster: BlockManagerMaster stopped
20/07/22 21:50:01 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/07/22 21:50:01 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 4, ip-172-31-56-3.us-west-2.compute.internal, executor driver): ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2023)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:1972)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:1971)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1971)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:950)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:950)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:950)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2203)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2152)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2141)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:752)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2093)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2114)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2133)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2158)
	at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1004)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:388)
	at org.apache.spark.rdd.RDD.collect(RDD.scala:1003)
	at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:385)
	at org.apache.spark.sql.Dataset.$anonfun$count$1(Dataset.scala:2979)
	at org.apache.spark.sql.Dataset.$anonfun$count$1$adapted(Dataset.scala:2978)
	at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3616)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:100)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:160)
	at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:87)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:763)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
	at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3614)
	at org.apache.spark.sql.Dataset.count(Dataset.scala:2978)
	at ml.dmlc.xgboost4j.scala.example.spark.Foobar$.main(Foobar.scala:30)
	at ml.dmlc.xgboost4j.scala.example.spark.Foobar.main(Foobar.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:928)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: ml.dmlc.xgboost4j.java.XGBoostError: [21:50:01] /home/ubuntu/xgboost/src/data/data.cc:524: Check failed: weights_.Size() == num_row_ (32768 vs. 32767) : Size of weights must equal to number of rows.
Stack trace:
  [bt] (0) /tmp/libxgboost4j977590708566383708.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f2a0cb79acc]
  [bt] (1) /tmp/libxgboost4j977590708566383708.so(xgboost::MetaInfo::Validate(int) const+0x106) [0x7f2a0cbefe66]
  [bt] (2) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::ValidateDMatrix(xgboost::DMatrix*) const+0x37) [0x7f2a0cc7aec7]
  [bt] (3) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::PredictRaw(xgboost::DMatrix*, xgboost::PredictionCacheEntry*, bool, unsigned int) const+0x44) [0x7f2a0cc7b104]
  [bt] (4) /tmp/libxgboost4j977590708566383708.so(xgboost::LearnerImpl::Predict(std::shared_ptr<xgboost::DMatrix>, bool, xgboost::HostDeviceVector<float>*, unsigned int, bool, bool, bool, bool, bool)+0x123) [0x7f2a0cc7d423]
  [bt] (5) /tmp/libxgboost4j977590708566383708.so(XGBoosterPredict+0x13d) [0x7f2a0cb7e62d]
  [bt] (6) /tmp/libxgboost4j977590708566383708.so(Java_ml_dmlc_xgboost4j_java_XGBoostJNI_XGBoosterPredict+0x43) [0x7f2a0cb71c73]
  [bt] (7) [0x7f2b5491c890]


	at ml.dmlc.xgboost4j.java.XGBoostJNI.checkCall(XGBoostJNI.java:48)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:312)
	at ml.dmlc.xgboost4j.java.Booster.predict(Booster.java:381)
	at ml.dmlc.xgboost4j.scala.Booster.predict(Booster.scala:172)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel.ml$dmlc$xgboost4j$scala$spark$XGBoostRegressionModel$$producePredictionItrs(XGBoostRegressor.scala:381)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.$anonfun$batchIterImpl$1(XGBoostRegressor.scala:310)
	at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:322)
	at ml.dmlc.xgboost4j.scala.spark.XGBoostRegressionModel$$anon$1.next(XGBoostRegressor.scala:278)
	at scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anon$1.hasNext(WholeStageCodegenExec.scala:729)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:98)
	at org.apache.spark.sql.execution.columnar.CachedRDDBuilder$$anon$1.next(InMemoryRelation.scala:90)
	at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:222)
	at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:299)
	at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1371)
	at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1298)
	at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1362)
	at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1186)
	at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:360)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:311)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:349)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:313)
	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
	at org.apache.spark.scheduler.Task.run(Task.scala:127)
	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:444)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:447)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)
20/07/22 21:50:01 INFO ShutdownHookManager: Shutdown hook called
20/07/22 21:50:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-34f9a30d-5a06-4250-8430-29d5461290ae
20/07/22 21:50:01 INFO ShutdownHookManager: Deleting directory /tmp/spark-c0e5a7b0-7d65-44bb-a064-577e88e4c02d

xgboost4j_spark_crash_log.txt

I used #5925 to log all invocations of the C API functions.

All C API invocations:

[21:49:58] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:49:58] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
[21:49:58] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
[21:49:59] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
[21:50:00] [XGBoost C API invocation] int XGBoosterGetModelRaw(BoosterHandle, xgboost::bst_ulong*, const char**)
[21:50:00] [XGBoost C API invocation] int XGBoosterCreate(void* const*, xgboost::bst_ulong, void**)
[21:50:00] [XGBoost C API invocation] int XGBoosterLoadModelFromBuffer(BoosterHandle, const void*, xgboost::bst_ulong)
[21:50:00] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixNumRow(DMatrixHandle, xgboost::bst_ulong*)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)
[21:50:01] [XGBoost C API invocation] int XGDMatrixCreateFromDataIter(void*, int (*)(DataIterHandle, int (*)(DataHolderHandle, XGBoostBatchCSR), DataHolderHandle), const char*, void**)
[21:50:01] [XGBoost C API invocation] xgboost::data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext, XGBoostBatchCSR>::Next()::<lambda(void*, XGBoostBatchCSR)> [with DataIterHandle = void*; XGBCallbackDataIterNext = int(void*, int (*)(void*, XGBoostBatchCSR), void*); XGBoostBatchCSR = XGBoostBatchCSR]
[21:50:01] [XGBoost C API invocation] int XGBoosterPredict(BoosterHandle, DMatrixHandle, int, unsigned int, int, xgboost::bst_ulong*, const bst_float**)
[21:50:01] [XGBoost C API invocation] int XGDMatrixFree(DMatrixHandle)

@hcho3
Copy link
Collaborator

hcho3 commented Jul 23, 2020

@trivialfis @RAMitchell We have an issue with the iterator adaptor. Consider a CSR batch consisting 32768 rows whose the last row is empty (no non-zero element). The common::ParallelGroupBuilder() function (used in SparsePage::Push()) deduces the number of rows in the batch to be 32767, because the last row contained no non-zero element. On the other hand, the weight vector is initialized to be a vector of size 32768 filled with 1.0, because all data points get 1.0 weight by default.

We will need to handle empty trailing rows or columns carefully.

@trivialfis
Copy link
Member

@hcho3 Glad that you take over this.

@hcho3 hcho3 changed the title Xgboost4spark 1.1.1 broken and consistently does not work [jvm-packages] Xgboost4spark 1.1.1 broken and consistently does not work Jul 23, 2020
@hcho3
Copy link
Collaborator

hcho3 commented Jul 23, 2020

The most minimal example: apply the following patch to the C++ unit test:

diff --git tests/cpp/data/test_adapter.cc tests/cpp/data/test_adapter.cc
index de835358..1da2a71c 100644
--- tests/cpp/data/test_adapter.cc
+++ tests/cpp/data/test_adapter.cc
@@ -73,10 +73,11 @@ class CSRIterForTest {
   std::vector<std::remove_pointer<decltype(std::declval<XGBoostBatchCSR>().index)>::type>
       feature_idx_ {0, 1, 0, 1, 1};
   std::vector<std::remove_pointer<decltype(std::declval<XGBoostBatchCSR>().offset)>::type>
-      row_ptr_ {0, 2, 4, 5};
+      row_ptr_ {0, 2, 4, 5, 5};
   size_t iter_ {0};
 
  public:
+  size_t static constexpr kRows { 4 };  // Test for the last row being empty
   size_t static constexpr kCols { 13 };  // Test for having some missing columns
 
   XGBoostBatchCSR Next() {
@@ -88,7 +89,7 @@ class CSRIterForTest {
     batch.offset = dmlc::BeginPtr(row_ptr_);
     batch.index = dmlc::BeginPtr(feature_idx_);
     batch.value = dmlc::BeginPtr(data_);
-    batch.size = 3;
+    batch.size = kRows;
 
     batch.label = nullptr;
     batch.weight = nullptr;
@@ -117,11 +118,11 @@ int CSRSetDataNextForTest(DataIterHandle data_handle,
   }
 }
 
-TEST(Adapter, IteratorAdaper) {
+TEST(Adapter, IteratorAdapter) {
   CSRIterForTest iter;
   data::IteratorAdapter<DataIterHandle, XGBCallbackDataIterNext,
                         XGBoostBatchCSR> adapter{&iter, CSRSetDataNextForTest};
-  constexpr size_t kRows { 6 };
+  constexpr size_t kRows { 8 };
 
   std::unique_ptr<DMatrix> data {
     DMatrix::Create(&adapter, std::numeric_limits<float>::quiet_NaN(), 1)
@@ -129,4 +130,5 @@ TEST(Adapter, IteratorAdaper) {
   ASSERT_EQ(data->Info().num_col_, CSRIterForTest::kCols);
   ASSERT_EQ(data->Info().num_row_, kRows);
 }
+
 }  // namespace xgboost

Log from ./build/testxgboost --gtest_filter=Adapter.IteratorAdapter:

[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from Adapter
[ RUN      ] Adapter.IteratorAdapter
/home/ubuntu/xgboost/tests/cpp/data/test_adapter.cc:131: Failure
Expected equality of these values:
  data->Info().num_row_
    Which is: 7
  kRows
    Which is: 8
[  FAILED  ] Adapter.IteratorAdapter (0 ms)
[----------] 1 test from Adapter (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test suite ran. (0 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Adapter.IteratorAdapter

 1 FAILED TEST

The example shows a matrix where Row ID 3 and 7 are empty.

The NativeDataIter from 0.90 XGBoost used a subclass of dmlc::Parser<> and thus handled empty rows correctly.

@hcho3
Copy link
Collaborator

hcho3 commented Jul 23, 2020

The bug affects IteratorAdapter and FileAdapter, because both returns kAdapterUnknownSize from the NumRows() method. The other three adapters have NumRows() method that returns a number and so they are not affected by the bug.

Minimal example for FileAdapter:

diff --git tests/cpp/data/test_simple_dmatrix.cc tests/cpp/data/test_simple_dmatrix.cc
index 691dc854..563a4949 100644
--- tests/cpp/data/test_simple_dmatrix.cc
+++ tests/cpp/data/test_simple_dmatrix.cc
@@ -185,16 +185,21 @@ TEST(SimpleDMatrix, FromCSC) {
 TEST(SimpleDMatrix, FromFile) {
   std::string filename = "test.libsvm";
   CreateBigTestData(filename, 3 * 5);
+  {
+    std::ofstream fo(filename, std::ios::app | std::ios::out);
+    fo << "0\n";
+  }
+  constexpr size_t expected_nrow = 6;
   std::unique_ptr<dmlc::Parser<uint32_t>> parser(
       dmlc::Parser<uint32_t>::Create(filename.c_str(), 0, 1, "auto"));
 
   auto verify_batch = [](SparsePage const &batch) {
-    EXPECT_EQ(batch.Size(), 5);
+    EXPECT_EQ(batch.Size(), expected_nrow);
     EXPECT_EQ(batch.offset.HostVector(),
-              std::vector<bst_row_t>({0, 3, 6, 9, 12, 15}));
+              std::vector<bst_row_t>({0, 3, 6, 9, 12, 15, 15}));
     EXPECT_EQ(batch.base_rowid, 0);
 
-    for (auto i = 0ull; i < batch.Size(); i++) {
+    for (auto i = 0ull; i < batch.Size() - 1; i++) {
       if (i % 2 == 0) {
         EXPECT_EQ(batch[i][0].index, 0);
         EXPECT_EQ(batch[i][1].index, 1);

@hcho3

This comment has been minimized.

@hcho3
Copy link
Collaborator

hcho3 commented Jul 23, 2020

@ranInc Hello, I submitted a pull request #5929 to fix the issue. If you'd like to try it out, run

git clone --recursive https://github.com/hcho3/xgboost -b handle_empty_rows
cd xgboost/jvm-packages
mvn package     # or mvn install

@ranInc
Copy link
Author

ranInc commented Jul 23, 2020

Hi, I think I will be able to test it out next week.
thanks!

@ranInc
Copy link
Author

ranInc commented Jul 26, 2020

Hi,
so I guess i need to wait for the release of 1.2 or use a non release build of 1.2?

@hcho3
Copy link
Collaborator

hcho3 commented Jul 26, 2020

@ranInc Yes. You can either wait for 1.2.0 release or use the SNAPSHOT version.

@Bishop-Cui
Copy link

@ranInc Yes. You can either wait for the 1.2.0 release or use the SNAPSHOT version.

Hello Hyunsu, I met the same problem like Ranlnc reported in 1.1.2, it's pretty tricky that due to the environment being fixed, we cannot use 1.2.0, and when I want to download the handle empty rows branch in your folk repo, it cannot be found. Would you happen to have any suggestions?

@hcho3
Copy link
Collaborator

hcho3 commented Aug 17, 2023

The bug has been long fixed, starting from 1.2.0. Please upgrade to the latest XGBoost. We are not able to support very old versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants