Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix large string val allocation failure, #3600 #3601

Closed

Conversation

kangpinghuang
Copy link
Contributor

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.
@EmmyMiao87
Copy link
Contributor

Please add a unit test about large string val

@chaoyli chaoyli self-assigned this May 15, 2020
@chaoyli chaoyli added the area/storage/in-memory Issues or PRs related to the memory storage engine label May 15, 2020
@chaoyli chaoyli removed the area/storage/in-memory Issues or PRs related to the memory storage engine label May 15, 2020
@@ -458,7 +458,8 @@ StringVal BitmapFunctions::bitmap_from_string(FunctionContext* ctx, const String
}

std::vector<uint64_t> bits;
if (!SplitStringAndParse({(const char*)input.ptr, input.len}, ",", &safe_strtou64, &bits)) {
// TODO: I think StringPiece's len should also be uint64_t
if (!SplitStringAndParse({(const char*)input.ptr, (int)input.len}, ",", &safe_strtou64, &bits)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not change it to int64 too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it relate to another struct StringPiece, which is defined in gutil, and to much code reference it. I think whether to change StringPiece need to be discussed and if true, can be done in another pr.

@morningman morningman added area/udf Issues or PRs related to the UDF kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels May 16, 2020
@kangkaisen
Copy link
Contributor

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.

@kangpinghuang Hi. you mean the bitmap size is larger than 2G? are you sure which is reasonable?

@kangpinghuang
Copy link
Contributor Author

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.

@kangpinghuang Hi. you mean the bitmap size is larger than 2G? are you sure which is reasonable?

yes, There is a user who use the bitmap to do the count distinct. In his situation, the dimension is about 30billion, the bitmap I see is about 3G.

@kangkaisen
Copy link
Contributor

Large bitmap will need use StringVal to allocate large memory, which is large than MAX_INT.
The overflow will cause serialization failure of bitmap.

@kangpinghuang Hi. you mean the bitmap size is larger than 2G? are you sure which is reasonable?

yes, There is a user who use the bitmap to do the count distinct. In his situation, the dimension is about 30billion, the bitmap I see is about 3G.

I See, Thanks. Have you tested this PR in prod env a long time?

I think simply change the StringVal size from int to uint64_t maybe have a lot of issues:

Such as, How do we handle network transfer?Mempool could handle 3G huge memory allocate well? the StringVal and Slice exchange could work well?

@EmmyMiao87
Copy link
Contributor

This pr has some error. The correct pr is here #3724

@EmmyMiao87 EmmyMiao87 closed this May 29, 2020
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/udf Issues or PRs related to the UDF kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants