Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor query param #2519

Merged
merged 2 commits into from
Mar 13, 2024
Merged

Conversation

noCharger
Copy link
Collaborator

@noCharger noCharger commented Feb 13, 2024

Description

  1. Refactor SparkSubmitParameters
  2. Move query param from spark/src/main/java/org/opensearch/sql/spark/client/StartJobRequest.java into spark/src/main/java/org/opensearch/sql/spark/asyncquery/model/SparkSubmitParameters.java to avoid input limitation from EMRS

Issues Resolved

#2376

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Feb 13, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.43%. Comparing base (1a09f96) to head (7c86b3a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2519      +/-   ##
============================================
+ Coverage     95.42%   95.43%   +0.01%     
- Complexity     5027     5031       +4     
============================================
  Files           483      484       +1     
  Lines         14020    14034      +14     
  Branches        944      940       -4     
============================================
+ Hits          13378    13394      +16     
+ Misses          621      619       -2     
  Partials         21       21              
Flag Coverage Δ
sql-engine 95.43% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@noCharger
Copy link
Collaborator Author

noCharger commented Feb 14, 2024

Confirmed the BWC failure unrelated to this PR by reproducing the same issue with a shadow PR checked from main branch https://github.com/opensearch-project/sql/actions/runs/7894433419/job/21545080478?pr=2520

@noCharger
Copy link
Collaborator Author

noCharger commented Mar 10, 2024

Rebased from main, @vamsi-amazon @penghuo @dai-chen Please review

./gradlew clean build -x integ-test:integTest -x :doctest:doctest

BUILD SUCCESSFUL in 9m 12s
179 actionable tasks: 176 executed, 3 up-to-date

Signed-off-by: Louis Chu <[email protected]>
@noCharger noCharger force-pushed the refactor-query-param branch 3 times, most recently from 1cc9819 to 7592f7a Compare March 12, 2024 23:01
penghuo
penghuo previously approved these changes Mar 12, 2024
Signed-off-by: Louis Chu <[email protected]>
@penghuo penghuo merged commit ee2dbd5 into opensearch-project:main Mar 13, 2024
29 of 30 checks passed
opensearch-trigger-bot bot pushed a commit that referenced this pull request Mar 13, 2024
* Refactor query param

Signed-off-by: Louis Chu <[email protected]>

* Reduce scope of changes

Signed-off-by: Louis Chu <[email protected]>

---------

Signed-off-by: Louis Chu <[email protected]>
(cherry picked from commit ee2dbd5)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@seankao-az
Copy link
Collaborator

documenting some findings on an issue with batch flint job on EMR-S after this PR.

Batch job submitted to EMR-S will fail.

For clearer error message, modified this line:
https://github.com/opensearch-project/opensearch-spark/blob/e6a97dcc8c248788b7afc7843f90282f4f8db7c4/spark-sql-application/src/main/scala/org/apache/spark/sql/FlintJob.scala#L50
into

throw new IllegalArgumentException(
  s"""Unsupported number of arguments. Expected 1 or 2 arguments. ${args.length}, ${args.mkString(" ")}""")

From the error message, perhaps it has something to do with how EMR parse the Spark Submit Parameters

The entire stderr file for the batch job

Example 1:

--conf spark.flint.job.query=refresh skipping index on mys3.default.http_logs
Files  s3://flint-dev-seankao/opensearch-spark-jars/opensearch-spark-standalone_2.12-0.3.0-SNAPSHOT.jar from /tmp/spark-b28da86c-b55e-45e6-9aa5-48432a15c19e/opensearch-spark-standalone_2.12-0.3.0-SNAPSHOT.jar to /home/hadoop/./opensearch-spark-standalone_2.12-0.3.0-SNAPSHOT.jar
Files  s3://flint-dev-seankao/opensearch-spark-jars/opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar from /tmp/spark-b28da86c-b55e-45e6-9aa5-48432a15c19e/opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar to /home/hadoop/./opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar
Files  s3://flint-dev-seankao/opensearch-spark-jars/opensearch-spark-ppl_2.12-0.3.0-SNAPSHOT.jar from /tmp/spark-b28da86c-b55e-45e6-9aa5-48432a15c19e/opensearch-spark-ppl_2.12-0.3.0-SNAPSHOT.jar to /home/hadoop/./opensearch-spark-ppl_2.12-0.3.0-SNAPSHOT.jar
24/03/18 21:45:32 WARN DependencyUtils: Local jar /home/hadoop/skipping does not exist, skipping.
Exception in thread "main" java.lang.IllegalArgumentException: Unsupported number of arguments. Expected 1 or 2 arguments. 5, index on mys3.default.http_logs file:///home/hadoop/.ivy2/jars/org.opensearch_opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar query_execution_result_mys3
	at org.apache.spark.sql.FlintJob$.main(FlintJob.scala:50)
	at org.apache.spark.sql.FlintJob.main(FlintJob.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1066)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1158)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1167)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
24/03/18 21:45:32 INFO ShutdownHookManager: Shutdown hook called
24/03/18 21:45:32 INFO ShutdownHookManager: Deleting directory /tmp/spark-b28da86c-b55e-45e6-9aa5-48432a15c19e

Example 2:

--conf spark.flint.job.query=select * from mys3.default.http_logs limit 10
Files  s3://flint-dev-seankao/opensearch-spark-jars/opensearch-spark-standalone_2.12-0.3.0-SNAPSHOT.jar from /tmp/spark-89b54a98-e1e7-47dd-849e-a16894ccfa03/opensearch-spark-standalone_2.12-0.3.0-SNAPSHOT.jar to /home/hadoop/./opensearch-spark-standalone_2.12-0.3.0-SNAPSHOT.jar
Files  s3://flint-dev-seankao/opensearch-spark-jars/opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar from /tmp/spark-89b54a98-e1e7-47dd-849e-a16894ccfa03/opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar to /home/hadoop/./opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar
Files  s3://flint-dev-seankao/opensearch-spark-jars/opensearch-spark-ppl_2.12-0.3.0-SNAPSHOT.jar from /tmp/spark-89b54a98-e1e7-47dd-849e-a16894ccfa03/opensearch-spark-ppl_2.12-0.3.0-SNAPSHOT.jar to /home/hadoop/./opensearch-spark-ppl_2.12-0.3.0-SNAPSHOT.jar
24/03/18 21:53:18 WARN DependencyUtils: Local jar /home/hadoop/* does not exist, skipping.
Exception in thread "main" java.lang.IllegalArgumentException: Unsupported number of arguments. Expected 1 or 2 arguments. 8, from mys3.default.http_logs limit 10 --conf spark.executorEnv.JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto.x86_64/ file:///home/hadoop/.ivy2/jars/org.opensearch_opensearch-spark-sql-application_2.12-0.3.0-SNAPSHOT.jar query_execution_result_mys3
	at org.apache.spark.sql.FlintJob$.main(FlintJob.scala:50)
	at org.apache.spark.sql.FlintJob.main(FlintJob.scala)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:568)
	at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
	at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1066)
	at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
	at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
	at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
	at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1158)
	at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1167)
	at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
24/03/18 21:53:18 INFO ShutdownHookManager: Shutdown hook called
24/03/18 21:53:18 INFO ShutdownHookManager: Deleting directory /tmp/spark-89b54a98-e1e7-47dd-849e-a16894ccfa03

@seankao-az
Copy link
Collaborator

--conf spark.flint.job.query="refresh skipping index on mys3.default.http_logs"

after wrapping the query with quotes, job succeeds

noCharger added a commit to noCharger/sql that referenced this pull request Mar 19, 2024
* Refactor query param

Signed-off-by: Louis Chu <[email protected]>

* Reduce scope of changes

Signed-off-by: Louis Chu <[email protected]>

---------

Signed-off-by: Louis Chu <[email protected]>
@noCharger noCharger added the v2.13.0 Issues targeting release v2.13.0 label Mar 19, 2024
penghuo pushed a commit that referenced this pull request Mar 19, 2024
* Refactor query param



* Reduce scope of changes



---------


(cherry picked from commit ee2dbd5)

Signed-off-by: Louis Chu <[email protected]>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x v2.13.0 Issues targeting release v2.13.0
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants