Skip to content
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.

[NSE-383] Release SMJ input data immediately after being used #387

Merged
merged 3 commits into from
Jul 23, 2021

Conversation

zhztheplayer
Copy link
Collaborator

No description provided.

@github-actions
Copy link

#383

@zhztheplayer zhztheplayer changed the title [NSE-383] Release SMJ input data immediately after being used [NSE-383] WIP Release SMJ input data immediately after being used Jun 30, 2021
@zhztheplayer zhztheplayer changed the title [NSE-383] WIP Release SMJ input data immediately after being used [NSE-383] WIP: Release SMJ input data immediately after being used Jun 30, 2021
@zhztheplayer
Copy link
Collaborator Author

zhztheplayer commented Jul 1, 2021

Arrow patch oap-project/arrow#25

@zhztheplayer
Copy link
Collaborator Author

@zhouyuan @weiting-chen Another required change is this branch needs arrow-dataset-jni.so to be dynamically linked. This may alter the standard build process so we may have another discussion if needed

@zhouyuan
Copy link
Collaborator

zhouyuan commented Jul 5, 2021

@zhztheplayer
I found the gap is mostly on SMJ requires to "look back" on the history values if there are redundant values - this may be not friendly to iterators. I'll make a clean up on this first

@zhztheplayer
Copy link
Collaborator Author

will checkout Arrow JNI iterator codes to avoid introducing more binary dependencies

@zhztheplayer
Copy link
Collaborator Author

@zhouyuan Based on previous discussion I am about to get the refactor on Arrow jni util work https://github.com/oap-project/arrow/pull/27/files. As a result we'll not have to change the c++ dependency of Gazelle.

@@ -568,7 +592,7 @@ case class ColumnarWholeStageCodegenExec(child: SparkPlan)(val codegenStageId: I
def close = {
closed = true
pipelineTime += (eval_elapse + build_elapse) / 1000000
buildRelationBatchHolder.foreach(_.close)
buildRelationBatchHolder.foreach(_.close) // fixing: ref cnt goes nagative
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems not necessary now as the iter will be closed when kernel close?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like BHJ & SHJ are still using this for caching data

@@ -586,7 +586,7 @@ arrow::Status CompileCodes(std::string codes, std::string signature) {
char* env_codegen_option_ = std::getenv("CODEGEN_OPTION");

if (env_codegen_option_ == nullptr) {
env_codegen_option_ = " -O3 -march=native ";
env_codegen_option_ = " -O0 -g ";
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not intended. Will remove

@zhouyuan
Copy link
Collaborator

@zhztheplayer rebase?

@zhztheplayer zhztheplayer changed the title [NSE-383] WIP: Release SMJ input data immediately after being used [NSE-383] Release SMJ input data immediately after being used Jul 23, 2021
Also, add switch option spark.oap.sql.columnar.sortmergejoin.lazyread to
Spark config.
@zhztheplayer
Copy link
Collaborator Author

zhztheplayer commented Jul 23, 2021

A switch added to enable this, set spark.oap.sql.columnar.sortmergejoin.lazyread=true.

@zhouyuan zhouyuan merged commit 6b12a93 into oap-project:master Jul 23, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants