Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce SparkParameterComposerCollection #2774

Merged
merged 3 commits into from
Jul 12, 2024

Conversation

ykmr1224
Copy link
Collaborator

@ykmr1224 ykmr1224 commented Jun 24, 2024

Description

  • Introduce SparkParameterComposerCollection to abstract the datasource type specific logics from async-query-core
  • DatasourceType specific parameters can be implemented in DataSourceSparkParameterComposer and registered to SparkParameterComposerCollection.
  • SparkParameterComposerCollection will hold all the implementation of DataSourceSparkParameterComposer and GeneralSparkParameterComposer, and apply them when composeByDatasource and compose is called.
  • I'm planning to remove SparkSubmitParameterModifier in separate PR since Composer will cover the use case.

Issues Resolved

n/a

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@ykmr1224 ykmr1224 force-pushed the dqs/datasource-extension branch from fb6ded6 to 8d16732 Compare June 24, 2024 23:47
@ykmr1224 ykmr1224 added backport 2.x maintenance Improves code quality, but not the product labels Jun 24, 2024
@ykmr1224 ykmr1224 force-pushed the dqs/datasource-extension branch from 8d16732 to 607ac20 Compare June 25, 2024 15:28
@ykmr1224 ykmr1224 marked this pull request as ready for review June 25, 2024 19:34
import lombok.Setter;

/** Define Spark Submit Parameters. */
public class SparkSubmitParameters {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved from:
async-query-core/src/main/java/org/opensearch/sql/spark/asyncquery/model/SparkSubmitParameters.java
And builder was extracted as separate class.

import org.opensearch.sql.spark.dispatcher.model.DispatchQueryRequest;
import org.opensearch.sql.spark.execution.statestore.OpenSearchStateStoreUtil;

public class SparkSubmitParametersBuilder {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in this builder was originally within SparkSubmitParameters class.
S3GLUE related configs are extracted to S3GlueDataSourceSparkParameterComposer

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it GeneralSparkParameterComposer? Hive and S3 relation configuration should in datasource composer?

/** Stores Spark parameter composers and dispatch compose request to each composer */
public class SparkParameterComposerCollection {
Collection<GeneralSparkParameterComposer> generalComposers = new ArrayList<>();
Map<DataSourceType, Collection<DataSourceSparkParameterComposer>> datasourceComposers =
Copy link
Collaborator

@penghuo penghuo Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need map? how about Collection of DataSourceSparkParameterComposer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to lookup by DataSourceType. Do you mean we want to lookup using linear search?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, i mean why not use Collection of DataSourceSparkParameterComposer.
understood now,

import org.opensearch.sql.spark.dispatcher.model.DispatchQueryRequest;
import org.opensearch.sql.spark.execution.statestore.OpenSearchStateStoreUtil;

public class SparkSubmitParametersBuilder {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it GeneralSparkParameterComposer? Hive and S3 relation configuration should in datasource composer?

SparkSubmitParameters sparkSubmitParameters,
DispatchQueryRequest dispatchQueryRequest,
AsyncQueryRequestContext context) {
settingLoader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need Setting loader, why not directly access setting in here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it had access from two places, I extracted as a class to avoid redundancy. (It include AccessController.doPrivileged, etc.)


@AllArgsConstructor
public class OpenSearchSparkSubmitParameterModifier implements SparkSubmitParameterModifier {

private String extraParameters;

@Override
public void modifyParameters(SparkSubmitParameters parameters) {
parameters.setExtraParameters(this.extraParameters);
public void modifyParameters(SparkSubmitParametersBuilder builder) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could u explain more on Modifier and Composer and add to PR description. are these necessary? could we simply the concept?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added description. I am planning to delete Modifier since Generic composer will cover the use case.


/**
* Interface for extension point to allow modification of spark submit parameter. modifyParameter
* method is called after the default spark submit parameter is build.
* method is called after the default spark submit parameter is build. To be deprecated in favor of
* {@link org.opensearch.sql.spark.parameter.GeneralSparkParameterComposer}
*/
public interface SparkSubmitParameterModifier {
Copy link
Member

@vmmusings vmmusings Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does modifier come into picture?
So we have two concepts composer and modifier. is modifier used after the completion of building spark submit parameters using composer?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking to remove Modifier once I can verify the Composer can satisfy the use cases.

@vmmusings
Copy link
Member

Please resolve the conflict.

@ykmr1224 ykmr1224 force-pushed the dqs/datasource-extension branch from 8a77cc1 to 5dac199 Compare June 28, 2024 21:14
penghuo
penghuo previously approved these changes Jun 28, 2024
vmmusings
vmmusings previously approved these changes Jun 28, 2024
@ykmr1224 ykmr1224 dismissed stale reviews from vmmusings and penghuo via f2f0cf5 June 28, 2024 22:52
vmmusings
vmmusings previously approved these changes Jun 29, 2024
ykmr1224 added 3 commits June 28, 2024 22:54
Signed-off-by: Tomoyuki Morita <[email protected]>
Signed-off-by: Tomoyuki Morita <[email protected]>
@ykmr1224 ykmr1224 force-pushed the dqs/datasource-extension branch from fe9b35f to 9f2162e Compare June 29, 2024 05:59
@ykmr1224 ykmr1224 merged commit a151a7d into opensearch-project:main Jul 12, 2024
13 of 20 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/sql/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/sql/backport-2.x
# Create a new branch
git switch --create backport/backport-2774-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 a151a7d4484134afb597473871a56a831ffbf323
# Push it to GitHub
git push --set-upstream origin backport/backport-2774-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/sql/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-2774-to-2.x.

ykmr1224 added a commit to ykmr1224/sql that referenced this pull request Jul 12, 2024
* Introduce SparkParameterComposerCollection

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix comments

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix integ test

Signed-off-by: Tomoyuki Morita <[email protected]>

---------

Signed-off-by: Tomoyuki Morita <[email protected]>
(cherry picked from commit a151a7d)
ykmr1224 added a commit that referenced this pull request Jul 16, 2024
* Introduce SparkParameterComposerCollection

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix comments

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix integ test

Signed-off-by: Tomoyuki Morita <[email protected]>

---------

Signed-off-by: Tomoyuki Morita <[email protected]>
(cherry picked from commit a151a7d)
manasvinibs pushed a commit to manasvinibs/sql that referenced this pull request Aug 14, 2024
* Introduce SparkParameterComposerCollection

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix comments

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix integ test

Signed-off-by: Tomoyuki Morita <[email protected]>

---------

Signed-off-by: Tomoyuki Morita <[email protected]>
jzonthemtn pushed a commit to jzonthemtn/sql that referenced this pull request Aug 28, 2024
* Introduce SparkParameterComposerCollection

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix comments

Signed-off-by: Tomoyuki Morita <[email protected]>

* Fix integ test

Signed-off-by: Tomoyuki Morita <[email protected]>

---------

Signed-off-by: Tomoyuki Morita <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x backport-failed maintenance Improves code quality, but not the product
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants