Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel fragment exec instance #851

Merged
merged 5 commits into from
Apr 9, 2019
Merged

Parallel fragment exec instance #851

merged 5 commits into from
Apr 9, 2019

Conversation

kangkaisen
Copy link
Contributor

For #831

fe/src/main/java/org/apache/doris/qe/Coordinator.java Outdated Show resolved Hide resolved
fe/src/main/java/org/apache/doris/qe/Coordinator.java Outdated Show resolved Hide resolved
fe/src/main/java/org/apache/doris/qe/Coordinator.java Outdated Show resolved Hide resolved
olapScanNode.getOlapTable().getName(), olapScanNode.getFragmentId());
return false;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this works? If the fragment is agg(hash(scan1, scan2)), for scan2, the RootNode of destFragment is agg.

I think if it isn't easy to match this case, we can return true for it.

For broadcast join, I think the bigger influence is that the outer child will be parallel, and making it would need more hash table for inner child. So it's ok to scan parallel here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. It's doesn't work for agg(hash(scan1, scan2)).

So, should we parallel for all query?
In my test, even if we parallel for small table scan, the performance loss is very little.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple, I think we can parallel for all queries.
However one thing we should notice is that when we enable fragment parallel, FE would send more PlanFragmentInstance than before, which would cost FE more CPU to serialize requests and more network bandwidth to transfer requests. Then as the number of concurrent queries increase, there would be some bottleneck for FE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simple, I think we can parallel for all queries.

OK.

which would cost FE more CPU to serialize requests and more network bandwidth to transfer requests.

Yes, this is a side-effect for this improve. But which should be acceptable.

@kangkaisen
Copy link
Contributor Author

I applied this PR to our prod env in March 29. The following is the FE CPU monitor:

image

@imay
Copy link
Contributor

imay commented Apr 2, 2019

I applied this PR to our prod env in March 29. The following is the FE CPU monitor:

image

What was the QPS for this cluster.

@kangkaisen
Copy link
Contributor Author

I applied this PR to our prod env in March 29. The following is the FE CPU monitor:
image

What was the QPS for this cluster.

20~30 for single server.

@imay imay merged commit 2a3bf58 into apache:master Apr 9, 2019
liaoxin01 pushed a commit to liaoxin01/doris that referenced this pull request Nov 18, 2022
* [Feature-WIP](inverted) add inverted index compaction implementation (apache#851)

* [Feature-WIP](inverted) add inverted index compaction implementation

* [Feature-WIP](inverted) set condition for rowid vec trans

* [Fix](inverted) remove useless function

* [Fix](inverted) seperate compound directory fs and cfs, for cloud mode and other situations

Co-authored-by: airborne12 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants