Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AMORO-3239] Fix stack overflow caused by reading too many partitions in the filter #3240

Merged
merged 12 commits into from
Oct 16, 2024

Conversation

7hong
Copy link
Contributor

@7hong 7hong commented Oct 11, 2024

… in the filter

Why are the changes needed?

Close #3239 .

Brief change log

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before making a pull request

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@lintingbin
Copy link
Contributor

I think we should solve this problem, rather than set a maximum number of partitions to bypass it, right? It's quite common for our online operations to have several hundred partitions.

@zhoujinsong
Copy link
Contributor

When too many partitions should be optimized, the partition filter seems to be useless.
We may use the alwaysTrue filter for that case, HDYT? @7hong @lintingbin

@lintingbin
Copy link
Contributor

When too many partitions should be optimized, the partition filter seems to be useless. We may use the alwaysTrue filter for that case, HDYT? @7hong @lintingbin

I think it's feasible. @7hong What do you think?

@7hong
Copy link
Contributor Author

7hong commented Oct 11, 2024

When too many partitions should be optimized, the partition filter seems to be useless. We may use the alwaysTrue filter for that case, HDYT? @7hong @lintingbin

I think it's feasible. @7hong What do you think?

I think it's feasible. I will add a parameter self-optimizing.ignore-filter-partition-count, which defaults to 100. Filters are not used when the number of partitions exceeds self-optimizing.ignore-filter-partition-count. What do you think? @zhoujinsong @lintingbin

@zhoujinsong
Copy link
Contributor

I will add a parameter self-optimizing.ignore-filter-partition-count, which defaults to 100. Filters are not used when the number of partitions exceeds self-optimizing.ignore-filter-partition-count. What do you think? @zhoujinsong @lintingbin

I am okay with that. But the configuration should be added to the AMS configuration rather than table configuration.

@lintingbin
Copy link
Contributor

How to add AMS configuration, you can refer to: https://github.com/apache/amoro/pull/3193/files @7hong

@7hong
Copy link
Contributor Author

7hong commented Oct 11, 2024

okk ,I'll change the code, thanks to both of you.

@majin1102
Copy link
Contributor

majin1102 commented Oct 12, 2024

Is this a bad usecase of iceberg expression?

I mean partition filter is what we could need in this case, especially for so many partitions and large mount of data per partition. what if we do not use iceberg expressions here and use a set to filter or something else. can we solve the problem?

On the other hand, if we do not filter partitions, the evaluation stage is somehow insiginificant, we could eliminate pending partitions in pendingInput to save DB storage @zhoujinsong @7hong @lintingbin

@zhoujinsong
Copy link
Contributor

Is this a bad usecase of iceberg expression?

I mean partition filter is what we could need in this case, especially for so many partitions and large mount of data per partition. what if we do not use iceberg expressions here and use a set to filter or something else. can we solve the problem?

Yes, It is a bad case to construct iceberg expression with too many conditions.
We can filter the data file by ourselves rather than pass it to iceberg scan, but we cannot get better plan performance, but still save some memory for our optimizing plan process.

We can improve this case in another PR.

On the other hand, if we do not filter partitions, the evaluation stage is somehow insiginificant, we could eliminate pending partitions in pendingInput to save DB storage

Yes, we may drop the partition set in pending state to save our db storage in current implementation.

@majin1102
Copy link
Contributor

majin1102 commented Oct 12, 2024

Is this a bad usecase of iceberg expression?
I mean partition filter is what we could need in this case, especially for so many partitions and large mount of data per partition. what if we do not use iceberg expressions here and use a set to filter or something else. can we solve the problem?

Yes, It is a bad case to construct iceberg expression with too many conditions. We can filter the data file by ourselves rather than pass it to iceberg scan, but we cannot get better plan performance, but still save some memory for our optimizing plan process.

We can improve this case in another PR.

On the other hand, if we do not filter partitions, the evaluation stage is somehow insiginificant, we could eliminate pending partitions in pendingInput to save DB storage

Yes, we may drop the partition set in pending state to save our db storage in current implementation.

‘optimizer.ignore-filter-partition-count’
I think this parameter is hard to describe on documents. since it appears to point to a temporary solution and not quite general.
I suggest two solutions:

  1. no parameter and make filter a hashset, the performance is well enough
  2. use parameter as 'self-optimizing.skip-evaluating-for-partition-count' or something like that, the parameter is meant for evaluating stage and skipping cases, if we do it in another PR, could leave a TODO and keep the proper parameter

@@ -54,6 +54,7 @@ ams:
task-ack-timeout: 30000 # 30s
polling-timeout: 3000 # 3s
max-planning-parallelism: 1 # default 1
ignore-filter-partition-count: 100 # default 100
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this param is not suitable for this place.
It's more suitable for self-optimizing groups.

And I think more proper meaning is 'self-optimizing.skip-evaluating-for-partition-count', WDYT @zhoujinsong

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's appropriate to put it in self-optimizing groups.

But I think skip-evaluating is inappropriate, because this parameter only controls whether to use the filter, not skipping the Evaluator.

Copy link
Contributor

@majin1102 majin1102 Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. I think skip is inappropriate too.

Originally there are no evaluating and filtering. But it is proved that directly planning would cause OOM because all partition files would be cached in memory, which leads to design a evaluating phase before planning(stream planning and store a partition set to avoid memory usage in planning phase). If we do not filter here, that means evaluating is not used or skipped.

Evaluating is a sensible concept and even be revealed on dashboard in the future, however filter is quite a detailed implementation and could be pointed to anything evolved or nothing. For example, partition filter is a fast implementation for evaluting, which has serveral drawbacks fed back:

  1. can not filter anything for non-partition table
  2. too large stores in sysdb
  3. the issue you have encountered

From my view, this PR has done a temporary optimization and the evaluating logic should evolve to resolve issues above, and I'm concerned this parameter pointed to filter is easy to be outdated(some work related to work I am pushing #2596).

How do you think using a set to filter to resolve the issue directly? Or you could maintain this evolation when related issues are raised. That would help a lot

@zhoujinsong
Copy link
Contributor

zhoujinsong commented Oct 14, 2024

I carefully read everyone's discussion and summarized the current situation as follows:

  • Adding too many conditions to iceberg expression is not a good practice.
  • Removing the partition filter for tables with too many partitions that need to be optimized is dangerous, and may lead to memory leak issues.

So I am thinking we may add an AMS property named self-optimizing.max-partition-count to limit the partition count an optimizing process may add.

I think it solved the current issue and limited the memory usage for the optimizing plan.
HDYT? @7hong @lintingbin @majin1102

@majin1102
Copy link
Contributor

I carefully read everyone's discussion and summarized the current situation as follows:

  • Adding too many conditions to iceberg expression is not a good practice.
  • Removing the partition filter for tables with too many partitions that need to be optimized is dangerous, and may lead to memory leak issues.

So I am thinking we may add an AMS property named self-optimizing.max-partition-count to limit the partition count an optimizing process may add.

I think it solved the current issue and limited the memory usage for the optimizing plan. HDYT? @7hong @lintingbin @majin1102

How does it work in evaluating stage and planning stage?

@7hong
Copy link
Contributor Author

7hong commented Oct 14, 2024

I carefully read everyone's discussion and summarized the current situation as follows:

  • Adding too many conditions to iceberg expression is not a good practice.
  • Removing the partition filter for tables with too many partitions that need to be optimized is dangerous, and may lead to memory leak issues.

So I am thinking we may add an AMS property named self-optimizing.max-partition-count to limit the partition count an optimizing process may add.

I think it solved the current issue and limited the memory usage for the optimizing plan. HDYT? @7hong @lintingbin @majin1102

Similar to what I submitted the first time? Solve this problem by limiting the number of partitions added?

    this.partitionFilter =
        tableRuntime.getPendingInput() == null
            ? Expressions.alwaysTrue()
            : tableRuntime.getPendingInput().getPartitions().entrySet().stream()
                .map(
                    entry ->
                        ExpressionUtil.convertPartitionDataToDataFilter(
                            table,
                            entry.getKey(),
                            entry.getValue().stream()
                                .limit(maxPartitionCount)      // here
                                .collect(Collectors.toSet())))
                .reduce(Expressions::or)
                .orElse(Expressions.alwaysTrue());

@zhoujinsong
Copy link
Contributor

That's true, but there is a small difference: during the evaluate phase, we limit the set of partitions in the pending input to only 100 (default), rather than truncating it when using it. This can help reduce storage overhead.

@7hong
Copy link
Contributor Author

7hong commented Oct 14, 2024

That's true, but there is a small difference: during the evaluate phase, we limit the set of partitions in the pending input to only 100 (default), rather than truncating it when using it. This can help reduce storage overhead.

Yes, you are right. It is a better solution to truncate during the pending stage.
But there is a small problem: the File Count displayed on the dashboard will be unrealistic

@7hong
Copy link
Contributor Author

7hong commented Oct 14, 2024

I added a parameter refresh-tables.max-pending-partition-count. Used to limit the maximum length of pendingInput.
Therefore the semantics of File Count and File Size on the dashboard will change (tables in pending state)

Copy link
Contributor

@zhoujinsong zhoujinsong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Thanks for the contribution!

@7hong 7hong requested a review from majin1102 October 16, 2024 02:01
Copy link
Contributor

@majin1102 majin1102 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for this contribution.

@majin1102 majin1102 merged commit cc29688 into apache:master Oct 16, 2024
4 checks passed
@majin1102
Copy link
Contributor

I added a parameter refresh-tables.max-pending-partition-count. Used to limit the maximum length of pendingInput. Therefore the semantics of File Count and File Size on the dashboard will change (tables in pending state)

It looks fine.
The 'File Count' and 'File Size' means pending metrics, not meant to be actually planned next round

zhoujinsong pushed a commit that referenced this pull request Oct 30, 2024
… in the filter (#3240)

* [AMORO-3239] Fix stack overflow caused by reading too many partitions in the filter

* [AMORO-3239] Add the "ignore-filter-partition-count" parameter

* move parameter "optimizer.ignore-filter-partition-count" to "self-optimizing.skip-filter-partition-count"

* move parameter "self-optimizing.skip-filter-partition-count" to "refresh-tables.max-pending-partition-count"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Planning table failed, java.lang.StackOverflowError: null
4 participants